Re: DIH replacement

2020-11-30 Thread Joel Bernstein
Check out this ticket:

https://issues.apache.org/jira/browse/SOLR-14673

There are lots of different ways that this could be applied as a
replacement for DIH.


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Nov 30, 2020 at 9:56 AM Erick Erickson 
wrote:

> For what I suggested, there’s no code to write, these streams exist
> already.
>
> As far as supporting the more complex cases… I’m -1 for adding special
> code to streaming. DIH has many moving parts. Each of those parts was put
> there for a reason, and needed to be supported through successive Solr
> releases. What I specifically do _not_ want to do is to start down the path
> of reproducing those parts with special-purpose streaming code that tries
> to replace DIH with equivalent streaming functionality.
>
> I think it’s kinder to end users to set expectations that they need to be
> responsible for the ETL process. If there is streaming capabilities that do
> the needful, they can certainly use them rather than write something
> themselves. Otherwise they need to create an independent ETL process.
>
> The origin of this thought was the realization that streaming can import
> from a DB as-is, one of the base use-cases for DIH. On a quick look, I
> don’t see any other streams that work with other data sources, say a
> TikaStream, a FileStream, etc...
>
> FWIW,
> Erick
>
>
> > On Nov 29, 2020, at 11:52 AM, Atri Sharma  wrote:
> >
> > FWIW i am interested in this -- happy to collaborate
> >
> > On Sun, 29 Nov 2020, 22:07 Erick Erickson, 
> wrote:
> > How far can we get in replacing DIH with streams? I can write a simple
> DIH implementation by wrapping a jdbc stream in an update stream for
> instance (I think).
> >
> > It falls down with some of the more complex DIH constructs, but the
> simple “pull data from the DB and insert it into Solr” case seems covered...
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: DIH replacement

2020-11-30 Thread Erick Erickson
For what I suggested, there’s no code to write, these streams exist already.

As far as supporting the more complex cases… I’m -1 for adding special code to 
streaming. DIH has many moving parts. Each of those parts was put there for a 
reason, and needed to be supported through successive Solr releases. What I 
specifically do _not_ want to do is to start down the path of reproducing those 
parts with special-purpose streaming code that tries to replace DIH with 
equivalent streaming functionality.

I think it’s kinder to end users to set expectations that they need to be 
responsible for the ETL process. If there is streaming capabilities that do the 
needful, they can certainly use them rather than write something themselves. 
Otherwise they need to create an independent ETL process.

The origin of this thought was the realization that streaming can import from a 
DB as-is, one of the base use-cases for DIH. On a quick look, I don’t see any 
other streams that work with other data sources, say a TikaStream, a 
FileStream, etc...

FWIW,
Erick


> On Nov 29, 2020, at 11:52 AM, Atri Sharma  wrote:
> 
> FWIW i am interested in this -- happy to collaborate 
> 
> On Sun, 29 Nov 2020, 22:07 Erick Erickson,  wrote:
> How far can we get in replacing DIH with streams? I can write a simple DIH 
> implementation by wrapping a jdbc stream in an update stream for instance (I 
> think).
> 
> It falls down with some of the more complex DIH constructs, but the simple 
> “pull data from the DB and insert it into Solr” case seems covered...
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: DIH replacement

2020-11-29 Thread Atri Sharma
FWIW i am interested in this -- happy to collaborate

On Sun, 29 Nov 2020, 22:07 Erick Erickson,  wrote:

> How far can we get in replacing DIH with streams? I can write a simple DIH
> implementation by wrapping a jdbc stream in an update stream for instance
> (I think).
>
> It falls down with some of the more complex DIH constructs, but the simple
> “pull data from the DB and insert it into Solr” case seems covered...
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


DIH replacement

2020-11-29 Thread Erick Erickson
How far can we get in replacing DIH with streams? I can write a simple DIH 
implementation by wrapping a jdbc stream in an update stream for instance (I 
think).

It falls down with some of the more complex DIH constructs, but the simple 
“pull data from the DB and insert it into Solr” case seems covered...
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org