RE: Delta Import with something other than Date
Alternatively, you could use the deltaQuery to retrieve the last indexed id from the DB (you'd have to save it there on your previous import). Your entity would look something like: entity name=my_entity deltaQuery=SELECT MAX(id) AS last_id_value FROM last_id_table deltaImportQuery=SELECT * FROM my_table WHERE id ${dataimporter.delta.last_id_value} ... field ... / /entity You could implement your deltaImportQuery as a stored procedure which would store the appropriate id in last_id_table (for the next delta-import) in addition to returning the data from the query. Ephraim Ofir -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, September 10, 2010 4:54 AM To: solr-user@lucene.apache.org Subject: Re: Delta Import with something other than Date On 9/9/2010 1:23 PM, Vladimir Sutskever wrote: Shawn, Can you provide a sample of passing the parameter via URL? And how using it would look in the data-config.xml Here's the URL that I send to do a full build on my last shard: http://idxst5-a:8983/solr/build/dataimport?command=full-importoptimize= truecommit=truedataTable=ncdatnumShards=6modVal=5minDid=0maxDid=24 2895591 If I want to do a delta, I just change the command to delta-import and give it a proper minDid value, rather than 0. Below is the entity from my data-config.xml. You have to have a deltaQuery defined for delta-import to work, but if you're going to use your own placeholders, just put something in that returns a single value very quickly. In my case, my query and deltaImportQuery are actually identical. entity name=dataTable pk=did query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ${dataimporter.request.dataTable} WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) deltaQuery=SELECT MAX(did) FROM ${dataimporter.request.dataTable} deltaImportQuery=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ${dataimporter.request.dataTable} WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) /entity
Re: Delta Import with something other than Date
Can you provide a sample of passing the parameter via URL? And how using it would look in the data-config.xml http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
Re: Delta Import with something other than Date
On 9/8/2010 4:32 PM, David Yang wrote: I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} I ran into the same thing. I track this in my build scripts and simply pass min and max ID variables via the dataimport URL, which are used in data-config.xml to plug them into my SQL statement. When I asked about it on the list, someone important told me to file a Jira on making it generic, it is SOLR-1920. https://issues.apache.org/jira/browse/SOLR-1920 Thanks, Shawn
RE: Delta Import with something other than Date
Shawn, Can you provide a sample of passing the parameter via URL? And how using it would look in the data-config.xml Thanks! -Vladimir -Original Message- From: Shawn Heisey [mailto:elyog...@elyograg.org] Sent: Thursday, September 09, 2010 3:04 PM To: solr-user@lucene.apache.org Subject: Re: Delta Import with something other than Date On 9/8/2010 4:32 PM, David Yang wrote: I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} I ran into the same thing. I track this in my build scripts and simply pass min and max ID variables via the dataimport URL, which are used in data-config.xml to plug them into my SQL statement. When I asked about it on the list, someone important told me to file a Jira on making it generic, it is SOLR-1920. https://issues.apache.org/jira/browse/SOLR-1920 Thanks, Shawn This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
Re: Delta Import with something other than Date
On 9/9/2010 1:23 PM, Vladimir Sutskever wrote: Shawn, Can you provide a sample of passing the parameter via URL? And how using it would look in the data-config.xml Here's the URL that I send to do a full build on my last shard: http://idxst5-a:8983/solr/build/dataimport?command=full-importoptimize=truecommit=truedataTable=ncdatnumShards=6modVal=5minDid=0maxDid=242895591 If I want to do a delta, I just change the command to delta-import and give it a proper minDid value, rather than 0. Below is the entity from my data-config.xml. You have to have a deltaQuery defined for delta-import to work, but if you're going to use your own placeholders, just put something in that returns a single value very quickly. In my case, my query and deltaImportQuery are actually identical. entity name=dataTable pk=did query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ${dataimporter.request.dataTable} WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) deltaQuery=SELECT MAX(did) FROM ${dataimporter.request.dataTable} deltaImportQuery=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ${dataimporter.request.dataTable} WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) /entity
Re: Delta Import with something other than Date
Of course you can store whatever you want in a solr index. And if you store an integer as a Solr 1.4 int type, you can certainly query for all documents that have greater than some specified integer in a field. You can't use SQL to query Solr though. I'm not sure what you're really asking? Jonathan David Yang wrote: Hi, I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} Cheers, David
RE: Delta Import with something other than Date
Currently DIH delta import uses the SQL query of type select id from item where last_modified ${dataimporter.last_index_time} What I need is some field like ${dataimporter.last_primary_key} wiki.apache.org/solr/DataImportHandler I am thinking of storing the last primary key externally and calling the delta-import with a parameter and using ${dataimporter.request.last_primary_key} but that seems like a very brittle approach Cheers, David -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, September 08, 2010 6:38 PM To: solr-user@lucene.apache.org Subject: Re: Delta Import with something other than Date Of course you can store whatever you want in a solr index. And if you store an integer as a Solr 1.4 int type, you can certainly query for all documents that have greater than some specified integer in a field. You can't use SQL to query Solr though. I'm not sure what you're really asking? Jonathan David Yang wrote: Hi, I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} Cheers, David
Re: Delta Import with something other than Date
https://issues.apache.org/jira/browse/SOLR-1499 This is a patch (not committed) that queries a Solr instance and returns the values as a DIH document. This allows you to do a sort query to Solr, ask for the first result, and continue indexing after that. Scary, but it works. Lance David Yang wrote: Currently DIH delta import uses the SQL query of type select id from item where last_modified ${dataimporter.last_index_time} What I need is some field like ${dataimporter.last_primary_key} wiki.apache.org/solr/DataImportHandler I am thinking of storing the last primary key externally and calling the delta-import with a parameter and using ${dataimporter.request.last_primary_key} but that seems like a very brittle approach Cheers, David -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, September 08, 2010 6:38 PM To: solr-user@lucene.apache.org Subject: Re: Delta Import with something other than Date Of course you can store whatever you want in a solr index. And if you store an integer as a Solr 1.4 int type, you can certainly query for all documents that have greater than some specified integer in a field. You can't use SQL to query Solr though. I'm not sure what you're really asking? Jonathan David Yang wrote: Hi, I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} Cheers, David
Re: Delta Import with something other than Date
On 09.09.2010, at 00:44, David Yang wrote: Currently DIH delta import uses the SQL query of type select id from item where last_modified ${dataimporter.last_index_time} What I need is some field like ${dataimporter.last_primary_key} wiki.apache.org/solr/DataImportHandler I am thinking of storing the last primary key externally and calling the delta-import with a parameter and using ${dataimporter.request.last_primary_key} but that seems like a very brittle approach i am also using request parameters in my DIH import. we are not yet in production but in our tests it worked fine. regards, Lukas Kahwe Smith m...@pooteeweet.org