RE: Delta Import with something other than Date

2010-09-12 Thread Ephraim Ofir
Alternatively, you could use the deltaQuery to retrieve the last indexed
id from the DB (you'd have to save it there on your previous import).
Your entity would look something like:
entity name=my_entity
deltaQuery=SELECT MAX(id) AS last_id_value FROM last_id_table
deltaImportQuery=SELECT * FROM my_table WHERE id 
${dataimporter.delta.last_id_value}
... 
field ... /
/entity

You could implement your deltaImportQuery as a stored procedure which
would store the appropriate id in last_id_table (for the next
delta-import) in addition to returning the data from the query.

Ephraim Ofir


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Friday, September 10, 2010 4:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Delta Import with something other than Date

  On 9/9/2010 1:23 PM, Vladimir Sutskever wrote:
 Shawn,

 Can you provide a sample of passing the parameter via URL? And how
using it would look in the data-config.xml


Here's the URL that I send to do a full build on my last shard:

http://idxst5-a:8983/solr/build/dataimport?command=full-importoptimize=
truecommit=truedataTable=ncdatnumShards=6modVal=5minDid=0maxDid=24
2895591

If I want to do a delta, I just change the command to delta-import and 
give it a proper minDid value, rather than 0.

Below is the entity from my data-config.xml.  You have to have a 
deltaQuery defined for delta-import to work, but if you're going to use 
your own placeholders, just put something in that returns a single value

very quickly.  In my case, my query and deltaImportQuery are actually 
identical.

entity name=dataTable pk=did
   query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM 
${dataimporter.request.dataTable} WHERE did gt; 
${dataimporter.request.minDid} AND did lt;= 
${dataimporter.request.maxDid} AND (did % 
${dataimporter.request.numShards}) IN (${dataimporter.request.modVal})
   deltaQuery=SELECT MAX(did) FROM
${dataimporter.request.dataTable}
   deltaImportQuery=SELECT *,FROM_UNIXTIME(post_date) as pd FROM 
${dataimporter.request.dataTable} WHERE did gt; 
${dataimporter.request.minDid} AND did lt;= 
${dataimporter.request.maxDid} AND (did % 
${dataimporter.request.numShards}) IN (${dataimporter.request.modVal})
/entity




Re: Delta Import with something other than Date

2010-09-10 Thread Alexey Serba
 Can you provide a sample of passing the parameter via URL? And how using it 
 would look in the data-config.xml
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters


Re: Delta Import with something other than Date

2010-09-09 Thread Shawn Heisey

 On 9/8/2010 4:32 PM, David Yang wrote:

I have a table that I want to index, and the table has no datetime
stamp. However, the table is append only so the primary key can only go
up. Is it possible to store the last primary key, and use some delta
query=select id where id${last_id_value}



I ran into the same thing.  I track this in my build scripts and simply 
pass min and max ID variables via the dataimport URL, which are used in 
data-config.xml to plug them into my SQL statement.  When I asked about 
it on the list, someone important told me to file a Jira on making it 
generic, it is SOLR-1920.


https://issues.apache.org/jira/browse/SOLR-1920

Thanks,
Shawn



RE: Delta Import with something other than Date

2010-09-09 Thread Vladimir Sutskever
Shawn,

Can you provide a sample of passing the parameter via URL? And how using it 
would look in the data-config.xml

Thanks!

-Vladimir


-Original Message-
From: Shawn Heisey [mailto:elyog...@elyograg.org] 
Sent: Thursday, September 09, 2010 3:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Delta Import with something other than Date

  On 9/8/2010 4:32 PM, David Yang wrote:
 I have a table that I want to index, and the table has no datetime
 stamp. However, the table is append only so the primary key can only go
 up. Is it possible to store the last primary key, and use some delta
 query=select id where id${last_id_value}


I ran into the same thing.  I track this in my build scripts and simply 
pass min and max ID variables via the dataimport URL, which are used in 
data-config.xml to plug them into my SQL statement.  When I asked about 
it on the list, someone important told me to file a Jira on making it 
generic, it is SOLR-1920.

https://issues.apache.org/jira/browse/SOLR-1920

Thanks,
Shawn

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.


Re: Delta Import with something other than Date

2010-09-09 Thread Shawn Heisey

 On 9/9/2010 1:23 PM, Vladimir Sutskever wrote:

Shawn,

Can you provide a sample of passing the parameter via URL? And how using it 
would look in the data-config.xml



Here's the URL that I send to do a full build on my last shard:

http://idxst5-a:8983/solr/build/dataimport?command=full-importoptimize=truecommit=truedataTable=ncdatnumShards=6modVal=5minDid=0maxDid=242895591

If I want to do a delta, I just change the command to delta-import and 
give it a proper minDid value, rather than 0.


Below is the entity from my data-config.xml.  You have to have a 
deltaQuery defined for delta-import to work, but if you're going to use 
your own placeholders, just put something in that returns a single value 
very quickly.  In my case, my query and deltaImportQuery are actually 
identical.


entity name=dataTable pk=did
  query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM 
${dataimporter.request.dataTable} WHERE did gt; 
${dataimporter.request.minDid} AND did lt;= 
${dataimporter.request.maxDid} AND (did % 
${dataimporter.request.numShards}) IN (${dataimporter.request.modVal})

  deltaQuery=SELECT MAX(did) FROM ${dataimporter.request.dataTable}
  deltaImportQuery=SELECT *,FROM_UNIXTIME(post_date) as pd FROM 
${dataimporter.request.dataTable} WHERE did gt; 
${dataimporter.request.minDid} AND did lt;= 
${dataimporter.request.maxDid} AND (did % 
${dataimporter.request.numShards}) IN (${dataimporter.request.modVal})

/entity




Re: Delta Import with something other than Date

2010-09-08 Thread Jonathan Rochkind
Of course you can store whatever you want in a solr index. And if you 
store an integer as a Solr 1.4 int type, you can certainly query for 
all documents that have greater than some specified integer in a field.


You can't use SQL to query Solr though.

I'm not sure what you're really asking?

Jonathan

David Yang wrote:

Hi,

I have a table that I want to index, and the table has no datetime
stamp. However, the table is append only so the primary key can only go
up. Is it possible to store the last primary key, and use some delta
query=select id where id${last_id_value}

Cheers,

David


  


RE: Delta Import with something other than Date

2010-09-08 Thread David Yang
Currently DIH delta import uses the SQL query of type select id from
item where last_modified  ${dataimporter.last_index_time}
What I need is some field like ${dataimporter.last_primary_key}
wiki.apache.org/solr/DataImportHandler
I am thinking of storing the last primary key externally and calling the
delta-import with a parameter and using
${dataimporter.request.last_primary_key} but that seems like a very
brittle approach

Cheers,
David

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Wednesday, September 08, 2010 6:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Delta Import with something other than Date

Of course you can store whatever you want in a solr index. And if you 
store an integer as a Solr 1.4 int type, you can certainly query for 
all documents that have greater than some specified integer in a field.

You can't use SQL to query Solr though.

I'm not sure what you're really asking?

Jonathan

David Yang wrote:
 Hi,

 I have a table that I want to index, and the table has no datetime
 stamp. However, the table is append only so the primary key can only
go
 up. Is it possible to store the last primary key, and use some delta
 query=select id where id${last_id_value}

 Cheers,

 David


   


Re: Delta Import with something other than Date

2010-09-08 Thread Lance Norskog

https://issues.apache.org/jira/browse/SOLR-1499

This is a patch (not committed) that queries a Solr instance and returns 
the values as a DIH document. This allows you to do a sort query to 
Solr, ask for the first result, and continue indexing after that. Scary, 
but it works.


Lance

David Yang wrote:

Currently DIH delta import uses the SQL query of type select id from
item where last_modified  ${dataimporter.last_index_time}
What I need is some field like ${dataimporter.last_primary_key}
wiki.apache.org/solr/DataImportHandler
I am thinking of storing the last primary key externally and calling the
delta-import with a parameter and using
${dataimporter.request.last_primary_key} but that seems like a very
brittle approach

Cheers,
David

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, September 08, 2010 6:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Delta Import with something other than Date

Of course you can store whatever you want in a solr index. And if you
store an integer as a Solr 1.4 int type, you can certainly query for
all documents that have greater than some specified integer in a field.

You can't use SQL to query Solr though.

I'm not sure what you're really asking?

Jonathan

David Yang wrote:
   

Hi,

I have a table that I want to index, and the table has no datetime
stamp. However, the table is append only so the primary key can only
 

go
   

up. Is it possible to store the last primary key, and use some delta
query=select id where id${last_id_value}

Cheers,

David



 


Re: Delta Import with something other than Date

2010-09-08 Thread Lukas Kahwe Smith

On 09.09.2010, at 00:44, David Yang wrote:

 Currently DIH delta import uses the SQL query of type select id from
 item where last_modified  ${dataimporter.last_index_time}
 What I need is some field like ${dataimporter.last_primary_key}
 wiki.apache.org/solr/DataImportHandler
 I am thinking of storing the last primary key externally and calling the
 delta-import with a parameter and using
 ${dataimporter.request.last_primary_key} but that seems like a very
 brittle approach


i am also using request parameters in my DIH import. we are not yet in 
production but in our tests it worked fine.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org