Does the DIH delta feature rewrite the delta-import file for each set of rows? 
If it does not, that sounds like a bug/enhancement. 
Lance

-----Original Message-----
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2008 8:51 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id 
feature

You can write the details to a file using a Transformer itself.

It is wise to stick to the public API as far as possible. We will maintain back 
compat and your code will be usable w/ newer versions.


On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
>
> Thanks I really apreciate your help.
>
> I didn't explain myself so well in here:
>
>> 2.-This is probably my most difficult goal.
>> Deltaimport reads a timestamp from the dataimport.properties and 
>> modify/add all documents from db wich were inserted after that date. 
>> What I want is to be able to save in the field the id of the last 
>> idexed doc. So in the next time I ejecute the indexer make it start 
>> indexing from that last indexed id doc.
> You can use a Transformer to write something to the DB.
> Context#getDataSource(String) for each row
>
> When I said:
>
>> be able to save in the field the id of the last idexed doc
> I made a mistake, wanted to mean :
>
> be able to save in the file (dataimport.properties) the id of the last 
> indexed doc.
> The point would be to do my own deltaquery indexing from the last doc 
> indexed id instead of the timestamp.
> So I think this would not work in that case (it's my mistake because 
> of the bad explanation):
>
>>You can use a Transformer to write something to the DB.
>>Context#getDataSource(String) for each row
>
> It is because I was saying:
>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>> Creating functions like getStartTime, persistStartTime... for ID 
>> control
>
> I am in the correct direction?
>  Sorry for my englis and thanks in advance
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese 
>> <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hey there,
>>>
>>> I have my dataimporthanlder almost completely configured. I am 
>>> missing three goals. I don't think I can reach them just via xml 
>>> conf or transformer and sqlEntitProcessor plugin. But need to be 
>>> sure of that.
>>> If there's no other way I will hack some solr source classes, would 
>>> like to know the best way to do that. Once I have it solved, I can 
>>> upload or post the source in the forum in case someone think it can 
>>> be helpful.
>>>
>>> 1.- Every time I execute dataimporthandler (to index data from a 
>>> db), at the start time or end time I need to delete some expired 
>>> documents. I have to delete them from the database and from the 
>>> index. I know wich documents must be deleted because of a field in 
>>> the db that says it. Would not like to delete first all from DB or 
>>> first all from index but one from index and one from doc every time.
>>
>> You can override the init() destroy() of the SqlEntityProcessor and 
>> use it as the processor for the root entity. At this point you can 
>> run the necessary db queries and solr delete queries . look at
>> Context#getSolrCore() and Context#getdataSource(String)
>>
>>
>>> The "delete mark" is setted as an update in the db row so I think I 
>>> could use deltaImport. Don't know If deletedPkQuery is the way to do 
>>> that. Can not find so much information about how to make it work. As 
>>> deltaQuery modifies docs (delete old and insert new) I supose it 
>>> must be a easy way to do this just doing the delete and not the new 
>>> insert.
>> deletedPkQuery does everything first. it runs the query and uses that 
>> to identify the deleted rows.
>>>
>>> 2.-This is probably my most difficult goal.
>>> Deltaimport reads a timestamp from the dataimport.properties and 
>>> modify/add all documents from db wich were inserted after that date. 
>>> What I want is to be able to save in the field the id of the last 
>>> idexed doc. So in the next time I ejecute the indexer make it start 
>>> indexing from that last indexed id doc.
>> You can use a Transformer to write something to the DB.
>> Context#getDataSource(String) for each row
>>
>>> The point of doing this is that if I do a full import from a db with 
>>> lots of rows the app could encounter a problem in the middle of the 
>>> execution and abort the process. As deltaquey works I would have to 
>>> restart the execution from the begining. Having this new 
>>> functionality I could optimize the index and start from the last 
>>> indexed doc.
>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>>> Creating functions like getStartTime, persistStartTime... for ID 
>>> control
>>>
>>> 3.-I commented before about this last point. I want to give boost to 
>>> doc fields at indexing time.
>>>>>Adding fieldboost is a planned item.
>>>
>>>>>It must work as follows .
>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>
>>>>>And DocBuilder should respect that. You can raise a bug and we can 
>>>>>commit it soon.
>>> How can I do to rise a bug?
>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>
>>> Thanks in advance
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
> --lastIndexed-id-feature-tp20788755p20790542.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



--
--Noble Paul

Reply via email to