Re: DataImport TXT file entity processor

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
an EntityProcessor looks right to me. It may help us add more
attributes if needed.

PlainTextEntityProcessor looks like a good name. It can also be used
to read html etc.
--Noble

On Sat, Jan 24, 2009 at 12:37 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Sat, Jan 24, 2009 at 5:56 AM, Nathan Adams na...@umich.edu wrote:

 Is there a way to us Data Import Handler to index non-XML (i.e. simple
 text) files (either via HTTP or FileSystem)?  I need to put the entire
 contents of a text file into a single field of a document and the other
 fields are being pulled out of Oracle...


 Not yet. But I think it will be nice to have. Can you open an issue in Jira?

 I think importing from HTTP was something another user had asked for
 recently. How do you get the url/path of this text file? That would help
 decide if we need a Transformer or EntityProcessor for these tasks.
 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


Re: Should I extend DIH to handle POST too?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
That does not look like a great option. DIH looks like an overkill for
this usecase.


You can write a simple UpdateHandler to do that .
All that you need to do is to extent ContentStreamHandlerBase and
register it as an UpdateHandler


On Sat, Jan 24, 2009 at 12:34 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 There's another option. Using DIH with Solrj. Take a look at:

 https://issues.apache.org/jira/browse/SOLR-853

 There's a patch there but it hasn't been updated to trunk. A contribution
 would be most welcome.

 On Sat, Jan 24, 2009 at 3:11 AM, Gunaranjan Chandraraju 
 chandrar...@apple.com wrote:

 Hi
 I had earlier described my requirement of needing to 'post XMLs as-is' to
 SOLR and have it handled just as the DIH would do on import using the
 mapping in data-config.xml.  I got multiple answers for the 'post approach'
 - the top two being

 - Use SOLR CELL
 - Use SOLRJ

 In general I would like to keep all the 'data conversion' inside the SOLR
 powered search system rather than having clients do the XSL and transforming
 the XML before sending them (CELL approach).

 My question is? How should I design this
  - Tomcat Servlet that provides this 'post' endpoint.  Accepts the XML over
 HTTP, transforms it and calls SOLRJ to update.  This is the same TOMCAT that
 houses SOLR.
  - SOLR Handler (Is this the right way?)
 - Take this a step further and implement it as an extension to DIH - a
 handler that will refer to DIH data-config xml and use the same
 transformation.  This way I can invoke an import for 'batched files' or do a
 'post 'for the same XML with the same data-config mapping being applied.
  Maybe it can be a separate handler that just refers to the same
 data-config.xml and not necessarily bundled with DIH handler code.

 Looking for some advise.  If the DIH extension is the way to go then I
 would be happy to extend it and contribute that back to SOLR.

 Regards,
 Guna




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Fergus McMenemie
Hello,

I am also a newbie and was wanting to do almost the exact same thing.
I was planning on doing the equivalent of:-

dataConfig
dataSource type=FileDataSource encoding=UTF-8 /
document
  entity name =f processor=FileListEntityProcessor
  baseDir=***
  fileName=.*xml
  rootEntity=false
  dataSource=null 
 entity
   name=record
   processor=XPathEntityProcessor
   stream=false
   rootEntity=false***changed***
   forEach=/record
   url=${f.fileAbsolutePath}
 field column=ID xpath=/record/@id commonField=true/ 
***change**
 !-- Address  --
  entity
 name=record_adr
 processor=XPathEntityProcessor
 stream=false
 forEach=/record/address
 url=${f.fileAbsolutePath}
  field column=address_street  xpath=/ 
record/address/@street /
  field column=address_state   
xpath=/record/address//@state /
  field column=address_typexpath=/ 
record/address//@type /
/entity
/entity
  /entity
/document
/dataConfig

ID is no longer unique within Solr, There would be multiple documents
with a given ID; one for each address. You can then search on ID and get 
the three addresses, you can also search on an address more sensibly.

I have not been able to try this yet as other issues are still to be
dealt with.

Comments?

Hi
I may be completely off on this being new to SOLR but I am not sure  
how to index related groups of fields in a document and preserver  
their 'grouping'.   I  would appreciate any help on this.Detailed  
description of the problem below.

I am trying to index an entity that can have multiple occurrences in  
the same document - e.g. Address.  The address could be Shipping,  
Home, Office etc.   Each address element has multiple values in it  
like street, state etc.Thus each address element is a group with  
the state and street in one address element being related to each other.

It looks like this in my source xml

record
coreInfo id=123 , .../
address street=XYZ1 State=CA ...type=home /
address street=XYZ2 state=CA ... type=Office/
address street=XYZ3 state=CA type=Other/
/record

I have setup my DIH to treat these as entities as below

dataConfig
dataSource type=FileDataSource encoding=UTF-8 /
document
  entity name =f processor=FileListEntityProcessor
  baseDir=***
  fileName=.*xml
  rootEntity=false
  dataSource=null 
 entity
name=record
  processor=XPathEntityProcessor
  stream=false
  forEach=/record
url=${f.fileAbsolutePath}
 field column=ID xpath=/record/@id /

 !-- Address  --
  entity
  name=record_adr
processor=XPathEntityProcessor
stream=false
forEach=/record/address
url=${f.fileAbsolutePath}
  field column=address_street  xpath=/ 
record/address/@street /
field column=address_state   
 xpath=/record/address//@state /
  field column=address_typexpath=/ 
record/address//@type /
   /entity
/entity
  /entity
/document
/dataConfig


The problem is as follows.  DIH seems to treat these as entities but  
solr seems to flatten them out on indexing to fields in a document  
(losing the entity part).

So when I search for the an ID - in the response all the street fields  
are bunched to-gather, followed by all the state fields type etc.   
Thus I can't associate which street address corresponds to which  
address type in the response.

What seems harder is this - say I need to query on 'Street' = XYZ1 and  
type=Office.  This should NOT return a document since the street for  
the office address is XY2 and not XYZ1.  However when I query for  
address_state:XYZ1 and address_type:Office I get back this document.

The problem seems to be that while DIH allows 'entities' within a  
document  the SOLR schema does not preserve them - it 'flattens' all  
of them out as indices for the document.

I could work around the problem by creating SOLR fields like  
home_address_street and office_address_street and do some xpath  
mapping.  However I don't want to do it as we can have multiple  
'other' addresses.  Also I have other fields whose type is not easily  
distinguished like address.

As I mentioned being new to SOLR I might have completely goofed on a  
way to set it up - much appreciate any direction on it. I am using  
SOLR 1.3

Regards,
Guna

-- 

===
Fergus 

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
nesting of an XPathEntityProcessor into another XPathEntityProcessor
is possible only if a field in an xml is a filename/url .
what is the purpose of nesting like this?
is it because you have multiple addresses? the possible solutions are
discussed elsewhere in this thread

On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie fer...@twig.me.uk wrote:
 Hello,

 I am also a newbie and was wanting to do almost the exact same thing.
 I was planning on doing the equivalent of:-

 dataConfig
dataSource type=FileDataSource encoding=UTF-8 /
document
  entity name =f processor=FileListEntityProcessor
  baseDir=***
  fileName=.*xml
  rootEntity=false
  dataSource=null 
 entity
   name=record
   processor=XPathEntityProcessor
   stream=false
   rootEntity=false***changed***
   forEach=/record
   url=${f.fileAbsolutePath}
 field column=ID xpath=/record/@id commonField=true/ 
 ***change**
 !-- Address  --
  entity
 name=record_adr
 processor=XPathEntityProcessor
 stream=false
 forEach=/record/address
 url=${f.fileAbsolutePath}
  field column=address_street  xpath=/
 record/address/@street /
  field column=address_state   
 xpath=/record/address//@state /
  field column=address_typexpath=/
 record/address//@type /
/entity
/entity
  /entity
/document
 /dataConfig

 ID is no longer unique within Solr, There would be multiple documents
 with a given ID; one for each address. You can then search on ID and get
 the three addresses, you can also search on an address more sensibly.

 I have not been able to try this yet as other issues are still to be
 dealt with.

 Comments?

Hi
I may be completely off on this being new to SOLR but I am not sure
how to index related groups of fields in a document and preserver
their 'grouping'.   I  would appreciate any help on this.Detailed
description of the problem below.

I am trying to index an entity that can have multiple occurrences in
the same document - e.g. Address.  The address could be Shipping,
Home, Office etc.   Each address element has multiple values in it
like street, state etc.Thus each address element is a group with
the state and street in one address element being related to each other.

It looks like this in my source xml

record
coreInfo id=123 , .../
address street=XYZ1 State=CA ...type=home /
address street=XYZ2 state=CA ... type=Office/
address street=XYZ3 state=CA type=Other/
/record

I have setup my DIH to treat these as entities as below

dataConfig
dataSource type=FileDataSource encoding=UTF-8 /
document
  entity name =f processor=FileListEntityProcessor
  baseDir=***
  fileName=.*xml
  rootEntity=false
  dataSource=null 
 entity
name=record
  processor=XPathEntityProcessor
  stream=false
  forEach=/record
url=${f.fileAbsolutePath}
 field column=ID xpath=/record/@id /

 !-- Address  --
  entity
  name=record_adr
processor=XPathEntityProcessor
stream=false
forEach=/record/address
url=${f.fileAbsolutePath}
  field column=address_street  xpath=/
record/address/@street /
field column=address_state   
 xpath=/record/address//@state /
  field column=address_typexpath=/
record/address//@type /
   /entity
/entity
  /entity
/document
/dataConfig


The problem is as follows.  DIH seems to treat these as entities but
solr seems to flatten them out on indexing to fields in a document
(losing the entity part).

So when I search for the an ID - in the response all the street fields
are bunched to-gather, followed by all the state fields type etc.
Thus I can't associate which street address corresponds to which
address type in the response.

What seems harder is this - say I need to query on 'Street' = XYZ1 and
type=Office.  This should NOT return a document since the street for
the office address is XY2 and not XYZ1.  However when I query for
address_state:XYZ1 and address_type:Office I get back this document.

The problem seems to be that while DIH allows 'entities' within a
document  the SOLR schema does not preserve them - it 'flattens' all
of them out as indices for the document.

I could work around the problem by creating SOLR fields like
home_address_street and office_address_street and do some xpath
mapping.  However I don't want to do it as we can have multiple
'other' addresses.  

Re: Master failover - seeking comments

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
Did you look at the new in-built replication?
http://wiki.apache.org/solr/SolrReplication#head-0e25211b6ef50373fcc2f9a6ad40380c169a5397

It can help you decide where to replicate from during runtime . Look
at the snappull command you can pass the masterUrl at the time of
replication.



On Fri, Jan 23, 2009 at 7:55 PM, edre...@ha edre...@homeaway.com wrote:

 Thanks for the response. Let me clarify things a bit.

 Regarding the Slaves:
 Our project is a web application. It is our desire to embedd Solr into the
 web application.   The web applications are configured with a local embedded
 Solr instance configured as a slave, and a remote Solr instance configured
 as a master.

 We have a requirement for real-time updates to the Solr indexes.  Our
 strategy is to use the local embedded Solr instance as a read-only
 repository.  Any time a write is made, we will send it to the remote Master.
 Once a user pushes a write operation to the remote Master, all subsequent
 read operations for this user now are made against the Master for the
 duration of the session.  This approximates realtime updates and seems to
 work for our purposes.  Writes to our system are a small percentage of Read
 operations.

 Now, back to the original question.  We're simply looking for failover
 solution if the Master server goes down.  Oh, and we are using the
 replication scripts to sync the servers.



 It seems like you are trying to write to Solr directly from your front end
 application. This is why you are thinking of multiple masters. I'll let
 others comment on how easy/hard/correct the solution would be.


 Well, yes.  We have business requirements that want updates to Solr to be
 realtime, or as close to that as possible, so when a user changes something,
 our strategy was to save it to the DB and push it to the Solr Master as
 well.  Although, we will have a background application that will help ensure
 that Solr is in sync with the DB for times that Solr is down and the DB is
 not.



 But, do you really need to have live writes? Can they be channeled through
 a
 background process? Since you anyway cannot do a commit per-write, the
 advantage of live writes is minimal. Moreover you would need to invest a
 lot
 of time in handling availability concerns to avoid losing updates. If you
 log/record the write requests to an intermediate store (or queue), you can
 do with one master (with another host on standby acting as a slave).


 We do need to have live writes, as I mentioned above.  The concern you
 mention about losing live writes is exactly why we are looking at a Master
 Solr server failover strategy.  We thought about having a backup Solr server
 that is a Slave to the Master and could be easily reconfigured as a new
 Master in a pinch.  Our operations team has pushed us to come up with a
 solution that would be more seamless.  This is why we came up with a
 Master/Master solution where both Masters are also slaves to each other.




 To test this, I ran the following scenario.

 1) Slave 1 (S1) is configured to use M2 as it's master.
 2) We push an update to M2.
 3) We restart S1, now pointing to M1.
 4) We wait for M1 to sync from M2
 5) We then sync S1 to M1.
 6) Success!


 How do you co-ordinate all this?


 This was just a test scenario I ran manually to see if the setup I described
 above would even work.

 Is there a Wiki page that outlines typical web application Solr deployment
 strategies?  There are a lot of questions on the forum about this type of
 thing (including this one).  For those who have expertise in this area, I'm
 sure there are many who could benefit from this (hint hint).

 As before, any comments or suggestions on the above would be much
 appreciated.

 Thanks,
 Erik
 --
 View this message in context: 
 http://www.nabble.com/Master-failover---seeking-comments-tp21614750p21625324.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: Random queries extremely slow

2009-01-24 Thread Alexander Ramos Jardim
Use multiple boxes, with a mirroring delaay from one to another, like a
pipeline.

2009/1/22 oleg_gnatovskiy oleg_gnatovs...@citysearch.com


 Well this probably isn't the cause of our random slow queries, but might be
 the cause of the slow queries after pulling a new index. Is there anything
 we could do to reduce the performance hit we take from this happening?



 Otis Gospodnetic wrote:
 
  Here is one example: pushing a large newly optimized index onto the
  server.
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: oleg_gnatovskiy oleg_gnatovs...@citysearch.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, January 22, 2009 2:22:51 PM
  Subject: Re: Random queries extremely slow
 
 
  What are some things that could happen to force files out of the cache
 on
  a
  Linux machine? I don't know what kinds of events to look for...
 
 
 
 
  yonik wrote:
  
   On Thu, Jan 22, 2009 at 1:46 PM, oleg_gnatovskiy
   wrote:
   Hello. Our production servers are operating relatively smoothly most
  of
   the
   time running Solr with 19 million listings. However every once in a
  while
   the same query that used to take 100 miliseconds takes 6000.
  
   Anything else happening on the system that may have forced some of the
   index files out of operating system disk cache at these times?
  
   -Yonik
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611240.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611454.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Alexander Ramos Jardim


Re: Results not appearing

2009-01-24 Thread Johnny X

They all appear in the stats admin page under the NumDocs  maxDocs fields.

I don't explicitly send a commit command, but my posting ends like this
(suggesting they are commited):

SimplePostTool: POSTing file 21166.xml
SimplePostTool: POSTing file 21169.xml
SimplePostTool: COMMITting Solr index changes..

I just tried re-posting all the documents set as text -- will that update
the current documents indexed? (bearing in mind the unique key, message-id,
will be included again)

When I try searching I still get 0 results for anything included in the
message-id and content fields, both of which should be indexed and returning
results...


Cheers for any help!


ryguasu wrote:
 
 These might be obvious, but:
 
 * I assume you did a Solr commit command after indexing, right?
 
 * If you are using the fieldtype definitions from the default
 schema.xml, then your string fields are not being analyzed, which
 means you should expect search results only if you enter the entire,
 exact value of one of the Message-ID or Date fields in your query. Is
 that your intention?
 
 And yes, your analysis of stored seems correct. Stored fields are
 those whose values you need back at query time, and indexed fields are
 those you can do queries on. For a few complications, see
 http://wiki.apache.org/solr/FieldOptionsByUseCase
 
 On Fri, Jan 23, 2009 at 8:04 PM, Johnny X jonathanwel...@gmail.com
 wrote:

 I've indexed my XML using the below in the schema:

   field name=Message-ID type=string indexed=true stored=true
 required=true/
   field name=Date type=string indexed=false stored=true/
   field name=From type=string indexed=false stored=true/
   field name=To type=string indexed=false stored=true/
   field name=Subject type=string indexed=false stored=true/
   field name=Mime-Version type=string indexed=false
 stored=true/
   field name=Content-Type type=string indexed=false
 stored=true/
   field name=Content-Transfer-Encoding type=string indexed=false
 stored=true/
   field name=X-From type=string indexed=false stored=true/
   field name=X-To type=string indexed=false stored=true/
   field name=X-cc type=string indexed=false stored=true/
   field name=X-bcc type=string indexed=false stored=true/
   field name=X-Folder type=string indexed=false stored=true/
   field name=X-Origin type=string indexed=false stored=true/
   field name=X-FileName type=string indexed=false stored=true/
   field name=Content type=string indexed=true stored=true/

  uniqueKeyMessage-ID/uniqueKey

 However searching via the Message-ID or Content fields returns 0. Using
 Luke
 I can still see these fields are stored however.

 Out of interest, by setting the other fields to just stored=true, can
 they
 be returned in a query as part of a search?


 Cheers.
 --
 View this message in context:
 http://www.nabble.com/Results-not-appearing-tp21637069p21637069.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Results-not-appearing-tp21637069p21640562.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hi Fergus,
XPathEntityprocessor can read multivalued fields easily

eg
dataConfig
   dataSource type=FileDataSource encoding=UTF-8 /
   document
 entity name =f processor=FileListEntityProcessor
 baseDir=***
 fileName=.*xml
 rootEntity=false
 dataSource=null 
entity
  name=record
  processor=XPathEntityProcessor
  forEach=/record
  url=${f.fileAbsolutePath}
field column=ID xpath=/record/@id
commonField=true/ ***change**
field column=address_street
xpath=/record/address/@street /
 field column=address_state
xpath=/record/address/@state /
 field column=address_type
xpath=/record/address/@type /

   /entity
 /entity
   /document
/dataConfig


In this case all address_street,address_state,address_type will be
returned as separate lists while parsing. If you wish to put them into
multple fields you can write a transformer and iterate thru the lists
and put them into separate fields. If there are 3 address tags then
you get a ListString for each fields where the length of the
list==3. If an item is missing it will be added as a null.

ensure that the fields are marked as multiValued=true in the
schema.xml. Otherwise it does not return ListString  . If there is
no corresponding mapping in schema.xml you can explicitly put it here
in the dataconfig.xml
eg: field column=address_state   multiValued=true
xpath=/record/address/@state /


I saw the syntax '/record/address//@state'. '//' is not supported .
You will have to explicitly give the full path.
--Noble



On Sat, Jan 24, 2009 at 2:57 PM, Noble Paul നോബിള്‍  नोब्ळ्
noble.p...@gmail.com wrote:
 nesting of an XPathEntityProcessor into another XPathEntityProcessor
 is possible only if a field in an xml is a filename/url .
 what is the purpose of nesting like this?
 is it because you have multiple addresses? the possible solutions are
 discussed elsewhere in this thread

 On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie fer...@twig.me.uk wrote:
 Hello,

 I am also a newbie and was wanting to do almost the exact same thing.
 I was planning on doing the equivalent of:-

 dataConfig
dataSource type=FileDataSource encoding=UTF-8 /
document
  entity name =f processor=FileListEntityProcessor
  baseDir=***
  fileName=.*xml
  rootEntity=false
  dataSource=null 
 entity
   name=record
   processor=XPathEntityProcessor
   stream=false
   rootEntity=false***changed***
   forEach=/record
   url=${f.fileAbsolutePath}
 field column=ID xpath=/record/@id commonField=true/ 
 ***change**
 !-- Address  --
  entity
 name=record_adr
 processor=XPathEntityProcessor
 stream=false
 forEach=/record/address
 url=${f.fileAbsolutePath}
  field column=address_street  xpath=/
 record/address/@street /
  field column=address_state   
 xpath=/record/address//@state /
  field column=address_typexpath=/
 record/address//@type /
/entity
/entity
  /entity
/document
 /dataConfig

 ID is no longer unique within Solr, There would be multiple documents
 with a given ID; one for each address. You can then search on ID and get
 the three addresses, you can also search on an address more sensibly.

 I have not been able to try this yet as other issues are still to be
 dealt with.

 Comments?

Hi
I may be completely off on this being new to SOLR but I am not sure
how to index related groups of fields in a document and preserver
their 'grouping'.   I  would appreciate any help on this.Detailed
description of the problem below.

I am trying to index an entity that can have multiple occurrences in
the same document - e.g. Address.  The address could be Shipping,
Home, Office etc.   Each address element has multiple values in it
like street, state etc.Thus each address element is a group with
the state and street in one address element being related to each other.

It looks like this in my source xml

record
coreInfo id=123 , .../
address street=XYZ1 State=CA ...type=home /
address street=XYZ2 state=CA ... type=Office/
address street=XYZ3 state=CA type=Other/
/record

I have setup my DIH to treat these as entities as below

dataConfig
dataSource type=FileDataSource encoding=UTF-8 /
document
  entity name =f processor=FileListEntityProcessor
  baseDir=***
  fileName=.*xml
  rootEntity=false
  dataSource=null 
 entity
name=record
  processor=XPathEntityProcessor
  stream=false
  forEach=/record

Re: Results not appearing

2009-01-24 Thread Johnny X

If it helps, everything appears when I use Luke to search through the
index...but the search in that returns nothing either.

When I search using the admin page for the word 'Phillip' (which appears the
most in all of the documents) I get the following:

  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=indenton/str 
  str name=start0/str 
  str name=qphillip/str 
  str name=rows10/str 
  str name=version2.2/str 
  /lst
  /lst
  result name=response numFound=0 start=0 / 
  /response


Duh...?



Johnny X wrote:
 
 They all appear in the stats admin page under the NumDocs  maxDocs
 fields.
 
 I don't explicitly send a commit command, but my posting ends like this
 (suggesting they are commited):
 
 SimplePostTool: POSTing file 21166.xml
 SimplePostTool: POSTing file 21169.xml
 SimplePostTool: COMMITting Solr index changes..
 
 I just tried re-posting all the documents set as text -- will that
 update the current documents indexed? (bearing in mind the unique key,
 message-id, will be included again)
 
 When I try searching I still get 0 results for anything included in the
 message-id and content fields, both of which should be indexed and
 returning results...
 
 
 Cheers for any help!
 
 
 ryguasu wrote:
 
 These might be obvious, but:
 
 * I assume you did a Solr commit command after indexing, right?
 
 * If you are using the fieldtype definitions from the default
 schema.xml, then your string fields are not being analyzed, which
 means you should expect search results only if you enter the entire,
 exact value of one of the Message-ID or Date fields in your query. Is
 that your intention?
 
 And yes, your analysis of stored seems correct. Stored fields are
 those whose values you need back at query time, and indexed fields are
 those you can do queries on. For a few complications, see
 http://wiki.apache.org/solr/FieldOptionsByUseCase
 
 On Fri, Jan 23, 2009 at 8:04 PM, Johnny X jonathanwel...@gmail.com
 wrote:

 I've indexed my XML using the below in the schema:

   field name=Message-ID type=string indexed=true stored=true
 required=true/
   field name=Date type=string indexed=false stored=true/
   field name=From type=string indexed=false stored=true/
   field name=To type=string indexed=false stored=true/
   field name=Subject type=string indexed=false stored=true/
   field name=Mime-Version type=string indexed=false
 stored=true/
   field name=Content-Type type=string indexed=false
 stored=true/
   field name=Content-Transfer-Encoding type=string indexed=false
 stored=true/
   field name=X-From type=string indexed=false stored=true/
   field name=X-To type=string indexed=false stored=true/
   field name=X-cc type=string indexed=false stored=true/
   field name=X-bcc type=string indexed=false stored=true/
   field name=X-Folder type=string indexed=false stored=true/
   field name=X-Origin type=string indexed=false stored=true/
   field name=X-FileName type=string indexed=false stored=true/
   field name=Content type=string indexed=true stored=true/

  uniqueKeyMessage-ID/uniqueKey

 However searching via the Message-ID or Content fields returns 0. Using
 Luke
 I can still see these fields are stored however.

 Out of interest, by setting the other fields to just stored=true, can
 they
 be returned in a query as part of a search?


 Cheers.
 --
 View this message in context:
 http://www.nabble.com/Results-not-appearing-tp21637069p21637069.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Results-not-appearing-tp21637069p21641692.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr-duplicate post management

2009-01-24 Thread S.Selvam Siva
On Thu, Jan 22, 2009 at 2:33 PM, S.Selvam Siva s.selvams...@gmail.comwrote:



 On Thu, Jan 22, 2009 at 7:12 AM, Chris Hostetter hossman_luc...@fucit.org
  wrote:


 : what i need is ,to log the existing urlid and new urlid(of course both
 will
 : not be same) ,when a .xml file of same id(unique field) is posted.
 :
 : I want to make this by modifying the solr source.Which file do i need to
 : modify so that i could get the above details in log ?
 :
 : I tried with DirectUpdateHandler2.java(which removes the duplicate
 : entries),but efforts in vein.

 DirectUpdateHandler2.java (on the trunk) delegates to Lucene-Java's
 IndexWriter.updateDocument method when you have a uniqueKey and you aren't
 allowing duplicates -- this method doesn't give you any way to access the
 old document(s) that had that existing key.

 The easiest way to make a change like what you are interested in might be
 an UpdateProcessor that does a lookup/search for the uniqueKey of each
 document about to be added to see if it already exists.  that's probably
 about as efficient as you can get, and would be nicely encapsulated.

 You might also want to take a look at SOLR-799, where some work is being
 done to create UpdateProcessors that can do near duplicate detection...

 http://wiki.apache.org/solr/Deduplication
 https://issues.apache.org/jira/browse/SOLR-799






 -Hoss




Hi, i added some code to *DirectUpdateHandler2.java's doDeletions()* (solr
1.2.0) ,and got the solution i wanted.(logging duplicate post entry-i.e old
field and new field of duplicate post)


   Document d1=searcher.doc(prev);//existing doc to be deleted
   Document d2=searcher.doc(tdocs.doc());//new doc
   String oldname=d1.get(name);
   String id1=d1.get(id);
   String newname=d2.get(name);
   String id2=d1.get(id);
   out3.write(id1+,+oldname+,+newname+\n);

But i dont know ,wether the performance of solr will be affected by this.
Any comment on the performance issue for the above solution is welcome...
-- 
Yours,
S.Selvam


Re: faceting question

2009-01-24 Thread Cam Bazz
is there no other way then to use the patch?

since the query A is super set of B ???

if not doable, I will probably use some caching technique.

Best.

On Sat, Jan 24, 2009 at 9:14 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Sat, Jan 24, 2009 at 6:56 AM, Cam Bazz camb...@gmail.com wrote:

 Hello;

 I got a multiField named tagList which may contain multiple tags. I am
 making a query like:

 tagList:a AND tagList:b AND tagList:c

 and I am also getting a tagList facet returning me some values.

 What I would like is Solr to return me facets as if the query was:
 tagList:a AND tagList:b

 is it even possible?


 If I understand correctly,
 1. You want to query for tagList:a AND tagList:b AND tagList:c
 2. At the same time, you want to request facets for tagList but only for
 tagList:a and tagList:b

 If that is correct, you can use the features introduced by
 https://issues.apache.org/jira/browse/SOLR-911

 However you may need to put #1 as fq instead of q.
 --
 Regards,
 Shalin Shekhar Mangar.



Re: Results not appearing

2009-01-24 Thread Chris Harris
I should clarify that I misspoke before; I thought you had
indexed=true on Message-Id and Date, whereas you had it on
Message-Id and Content. It sounds like you figured this out and
interpreted my reply in a useful way nonetheless, though. So that's
good.

The post tool should be a valid way to commit.

As for your technique of updating the field types and reindexing the
documents, I think it should be fine provided you kept the field type
for the Message-Id field as string. If you changed it to text along
with the other field types, then there's a chance your update
technique might instead of the effect of inserting a duplicate copy of
each document, so there are two copies of each document, one
searchable, and one not searchable. (I'm not totally sure about this,
but it's a worry I would have.) That doesn't sound like what's
happened to you, though.

Could the problem be that you're not specifying which field to query?
If you're using the standard query analyzer and the stock schema.xml,
then the default field name is text, whereas you don't have a field
called text in your schema. In that setup if you want to search on
the Content field you need to say so explicitly, like so:

Content:phillip

On Sat, Jan 24, 2009 at 7:25 AM, Johnny X jonathanwel...@gmail.com wrote:

 If it helps, everything appears when I use Luke to search through the
 index...but the search in that returns nothing either.

 When I search using the admin page for the word 'Phillip' (which appears the
 most in all of the documents) I get the following:

  ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
 - lst name=params
  str name=indenton/str
  str name=start0/str
  str name=qphillip/str
  str name=rows10/str
  str name=version2.2/str
  /lst
  /lst
  result name=response numFound=0 start=0 /
  /response


 Duh...?



 Johnny X wrote:

 They all appear in the stats admin page under the NumDocs  maxDocs
 fields.

 I don't explicitly send a commit command, but my posting ends like this
 (suggesting they are commited):

 SimplePostTool: POSTing file 21166.xml
 SimplePostTool: POSTing file 21169.xml
 SimplePostTool: COMMITting Solr index changes..

 I just tried re-posting all the documents set as text -- will that
 update the current documents indexed? (bearing in mind the unique key,
 message-id, will be included again)

 When I try searching I still get 0 results for anything included in the
 message-id and content fields, both of which should be indexed and
 returning results...


 Cheers for any help!


 ryguasu wrote:

 These might be obvious, but:

 * I assume you did a Solr commit command after indexing, right?

 * If you are using the fieldtype definitions from the default
 schema.xml, then your string fields are not being analyzed, which
 means you should expect search results only if you enter the entire,
 exact value of one of the Message-ID or Date fields in your query. Is
 that your intention?

 And yes, your analysis of stored seems correct. Stored fields are
 those whose values you need back at query time, and indexed fields are
 those you can do queries on. For a few complications, see
 http://wiki.apache.org/solr/FieldOptionsByUseCase

 On Fri, Jan 23, 2009 at 8:04 PM, Johnny X jonathanwel...@gmail.com
 wrote:

 I've indexed my XML using the below in the schema:

   field name=Message-ID type=string indexed=true stored=true
 required=true/
   field name=Date type=string indexed=false stored=true/
   field name=From type=string indexed=false stored=true/
   field name=To type=string indexed=false stored=true/
   field name=Subject type=string indexed=false stored=true/
   field name=Mime-Version type=string indexed=false
 stored=true/
   field name=Content-Type type=string indexed=false
 stored=true/
   field name=Content-Transfer-Encoding type=string indexed=false
 stored=true/
   field name=X-From type=string indexed=false stored=true/
   field name=X-To type=string indexed=false stored=true/
   field name=X-cc type=string indexed=false stored=true/
   field name=X-bcc type=string indexed=false stored=true/
   field name=X-Folder type=string indexed=false stored=true/
   field name=X-Origin type=string indexed=false stored=true/
   field name=X-FileName type=string indexed=false stored=true/
   field name=Content type=string indexed=true stored=true/

  uniqueKeyMessage-ID/uniqueKey

 However searching via the Message-ID or Content fields returns 0. Using
 Luke
 I can still see these fields are stored however.

 Out of interest, by setting the other fields to just stored=true, can
 they
 be returned in a query as part of a search?


 Cheers.
 --
 View this message in context:
 http://www.nabble.com/Results-not-appearing-tp21637069p21637069.html
 Sent from the Solr - User mailing list archive at Nabble.com.







 --
 View this message in context: 
 http://www.nabble.com/Results-not-appearing-tp21637069p21641692.html
 Sent from the Solr - User 

Re: Solr stemming - preserve original words

2009-01-24 Thread AHMET ARSLAN
I still don't understand your final goal but if you want to get an output in 
the form of 
run(40) = 20 from running, 10 from run, 8 from runners, 2 from runner 
you need to index your documents using standard analyzer. Walk through the 
index using org.apache.lucene.index.IndexReader and stem each term using 
stemmer. Storing stems (key) and orignal word list (value) in a map will give 
that kind of output.

However if seeing something like the following list (not exactly you want but 
similar) on schema.jsp will help you

run=run
run=running
run=runner
run=runners

add one line of code 

newstr = newstr + = +  new String(termBuffer, 0, len);

to org.apache.solr.analysis.EnglishPorterFilterFactory.java between lines #116 
and #117.

Rename the file, compile the code, put your jar file to libs directory under 
your solr home. Now you can use your new FilfterFactory in your schema.xml


--- On Sat, 1/24/09, Thushara Wijeratna thu...@gmail.com wrote:

 From: Thushara Wijeratna thu...@gmail.com
 Subject: Re: Solr stemming - preserve original words
 To: solr-user@lucene.apache.org, iori...@yahoo.com
 Date: Saturday, January 24, 2009, 1:53 AM
 Chris, Ahmet - thanks for the responses.
 
 Ahmet - yes, i want to see run as a top term +
 the original words that
 formed that term
 The reason is that due to mis-stemming, the terms could
 become non-english.
 ex:  permanent would stem to perm,
 archive would become archiv.
 
 I need to extract a set of keywords from the indexed
 content - I'd like
 these to be correct full english words.
 
 thanks,
 thushara


  


size of solr update document a limitation?

2009-01-24 Thread Paul Libbrecht


Hello Solr experts,

is good practice to post large solr update documents?
(e.g. 100kb-2mb).
Will solr do the necessary tricks to make the field use a reader  
instead of strings?


thanks in advance

paul

smime.p7s
Description: S/MIME cryptographic signature


Re: Results not appearing

2009-01-24 Thread Johnny X

Thanks for the reply.

I ended up fixing it by re-installing Tomcat and starting over. Searches now
appear to work.

Because I'm testing atm however, is it possible to delete the index and
start afresh in future.

At the moment I backed up the original index folder...if I just replace that
with the current one including an index will that work...or will other parts
of Solr recognise it's changed and as a result not work?

What's the best solution for removing the index?


Cheers.



ryguasu wrote:
 
 I should clarify that I misspoke before; I thought you had
 indexed=true on Message-Id and Date, whereas you had it on
 Message-Id and Content. It sounds like you figured this out and
 interpreted my reply in a useful way nonetheless, though. So that's
 good.
 
 The post tool should be a valid way to commit.
 
 As for your technique of updating the field types and reindexing the
 documents, I think it should be fine provided you kept the field type
 for the Message-Id field as string. If you changed it to text along
 with the other field types, then there's a chance your update
 technique might instead of the effect of inserting a duplicate copy of
 each document, so there are two copies of each document, one
 searchable, and one not searchable. (I'm not totally sure about this,
 but it's a worry I would have.) That doesn't sound like what's
 happened to you, though.
 
 Could the problem be that you're not specifying which field to query?
 If you're using the standard query analyzer and the stock schema.xml,
 then the default field name is text, whereas you don't have a field
 called text in your schema. In that setup if you want to search on
 the Content field you need to say so explicitly, like so:
 
 Content:phillip
 
 On Sat, Jan 24, 2009 at 7:25 AM, Johnny X jonathanwel...@gmail.com
 wrote:

 If it helps, everything appears when I use Luke to search through the
 index...but the search in that returns nothing either.

 When I search using the admin page for the word 'Phillip' (which appears
 the
 most in all of the documents) I get the following:

  ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
 - lst name=params
  str name=indenton/str
  str name=start0/str
  str name=qphillip/str
  str name=rows10/str
  str name=version2.2/str
  /lst
  /lst
  result name=response numFound=0 start=0 /
  /response


 Duh...?



 Johnny X wrote:

 They all appear in the stats admin page under the NumDocs  maxDocs
 fields.

 I don't explicitly send a commit command, but my posting ends like this
 (suggesting they are commited):

 SimplePostTool: POSTing file 21166.xml
 SimplePostTool: POSTing file 21169.xml
 SimplePostTool: COMMITting Solr index changes..

 I just tried re-posting all the documents set as text -- will that
 update the current documents indexed? (bearing in mind the unique key,
 message-id, will be included again)

 When I try searching I still get 0 results for anything included in the
 message-id and content fields, both of which should be indexed and
 returning results...


 Cheers for any help!


 ryguasu wrote:

 These might be obvious, but:

 * I assume you did a Solr commit command after indexing, right?

 * If you are using the fieldtype definitions from the default
 schema.xml, then your string fields are not being analyzed, which
 means you should expect search results only if you enter the entire,
 exact value of one of the Message-ID or Date fields in your query. Is
 that your intention?

 And yes, your analysis of stored seems correct. Stored fields are
 those whose values you need back at query time, and indexed fields are
 those you can do queries on. For a few complications, see
 http://wiki.apache.org/solr/FieldOptionsByUseCase

 On Fri, Jan 23, 2009 at 8:04 PM, Johnny X jonathanwel...@gmail.com
 wrote:

 I've indexed my XML using the below in the schema:

   field name=Message-ID type=string indexed=true stored=true
 required=true/
   field name=Date type=string indexed=false stored=true/
   field name=From type=string indexed=false stored=true/
   field name=To type=string indexed=false stored=true/
   field name=Subject type=string indexed=false stored=true/
   field name=Mime-Version type=string indexed=false
 stored=true/
   field name=Content-Type type=string indexed=false
 stored=true/
   field name=Content-Transfer-Encoding type=string
 indexed=false
 stored=true/
   field name=X-From type=string indexed=false stored=true/
   field name=X-To type=string indexed=false stored=true/
   field name=X-cc type=string indexed=false stored=true/
   field name=X-bcc type=string indexed=false stored=true/
   field name=X-Folder type=string indexed=false stored=true/
   field name=X-Origin type=string indexed=false stored=true/
   field name=X-FileName type=string indexed=false
 stored=true/
   field name=Content type=string indexed=true stored=true/

  uniqueKeyMessage-ID/uniqueKey

 However searching via the 

Re: Results not appearing

2009-01-24 Thread Chris Harris
Without you stopping Solr itself, a solr client can remove all the
documents in an index by doing a delete-by-query with the query *:*
(without quotes). For XML interface clients, see
http://wiki.apache.org/solr/UpdateXmlMessage. Solrj would have another
way to do it. You'll need to do a commit after this to flush your
changes.

Alternatively, you can stop Solr and delete the whole data/ directory,
which includes the index directory. If you do this, Solr will create a
new fresh one the next time it starts up.

For backups it might be a better habit to backup the data/ directory,
rather than just the data/index directory. Assuming your schema.xml
hasn't changed, then you should be able to restore one data/ directory
with another. If you're changing your schema file, though, you need to
make sure you restore a version of that file that is consistent with
the one that you indexed with.

On Sat, Jan 24, 2009 at 5:43 PM, Johnny X jonathanwel...@gmail.com wrote:

 Thanks for the reply.

 I ended up fixing it by re-installing Tomcat and starting over. Searches now
 appear to work.

 Because I'm testing atm however, is it possible to delete the index and
 start afresh in future.

 At the moment I backed up the original index folder...if I just replace that
 with the current one including an index will that work...or will other parts
 of Solr recognise it's changed and as a result not work?

 What's the best solution for removing the index?


 Cheers.



 ryguasu wrote:

 I should clarify that I misspoke before; I thought you had
 indexed=true on Message-Id and Date, whereas you had it on
 Message-Id and Content. It sounds like you figured this out and
 interpreted my reply in a useful way nonetheless, though. So that's
 good.

 The post tool should be a valid way to commit.

 As for your technique of updating the field types and reindexing the
 documents, I think it should be fine provided you kept the field type
 for the Message-Id field as string. If you changed it to text along
 with the other field types, then there's a chance your update
 technique might instead of the effect of inserting a duplicate copy of
 each document, so there are two copies of each document, one
 searchable, and one not searchable. (I'm not totally sure about this,
 but it's a worry I would have.) That doesn't sound like what's
 happened to you, though.

 Could the problem be that you're not specifying which field to query?
 If you're using the standard query analyzer and the stock schema.xml,
 then the default field name is text, whereas you don't have a field
 called text in your schema. In that setup if you want to search on
 the Content field you need to say so explicitly, like so:

 Content:phillip

 On Sat, Jan 24, 2009 at 7:25 AM, Johnny X jonathanwel...@gmail.com
 wrote:

 If it helps, everything appears when I use Luke to search through the
 index...but the search in that returns nothing either.

 When I search using the admin page for the word 'Phillip' (which appears
 the
 most in all of the documents) I get the following:

  ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
 - lst name=params
  str name=indenton/str
  str name=start0/str
  str name=qphillip/str
  str name=rows10/str
  str name=version2.2/str
  /lst
  /lst
  result name=response numFound=0 start=0 /
  /response


 Duh...?



 Johnny X wrote:

 They all appear in the stats admin page under the NumDocs  maxDocs
 fields.

 I don't explicitly send a commit command, but my posting ends like this
 (suggesting they are commited):

 SimplePostTool: POSTing file 21166.xml
 SimplePostTool: POSTing file 21169.xml
 SimplePostTool: COMMITting Solr index changes..

 I just tried re-posting all the documents set as text -- will that
 update the current documents indexed? (bearing in mind the unique key,
 message-id, will be included again)

 When I try searching I still get 0 results for anything included in the
 message-id and content fields, both of which should be indexed and
 returning results...


 Cheers for any help!


 ryguasu wrote:

 These might be obvious, but:

 * I assume you did a Solr commit command after indexing, right?

 * If you are using the fieldtype definitions from the default
 schema.xml, then your string fields are not being analyzed, which
 means you should expect search results only if you enter the entire,
 exact value of one of the Message-ID or Date fields in your query. Is
 that your intention?

 And yes, your analysis of stored seems correct. Stored fields are
 those whose values you need back at query time, and indexed fields are
 those you can do queries on. For a few complications, see
 http://wiki.apache.org/solr/FieldOptionsByUseCase

 On Fri, Jan 23, 2009 at 8:04 PM, Johnny X jonathanwel...@gmail.com
 wrote:

 I've indexed my XML using the below in the schema:

   field name=Message-ID type=string indexed=true stored=true
 required=true/
   field name=Date 

Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Gunaranjan Chandraraju
I make this approach work with XPATH and XSL.   However, this approach  
creates multiple fields of like this


address_state_1
address_state_2
...
address_state_10

and

credit_card_1
credit_card_2
credit_card_3


How do I search for a credit_card.The query syntax does not seem  
to support wild cards in field names.   For e.g. I cant seem to do  
this -   credit_card*:1234 4567 7890 1234


On the search side I would not know how many credit card fields  got  
created for a document and so I need that to be dynamic.


-g


On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:


Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:


On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju 
chandrar...@apple.com wrote:



record
 coreInfo id=123 , .../
 address street=XYZ1 State=CA ...type=home /
 address street=XYZ2 state=CA ... type=Office/
 address street=XYZ3 state=CA type=Other/
/record

I have setup my DIH to treat these as entities as below

dataConfig
 dataSource type=FileDataSource encoding=UTF-8 /
 document
   entity name =f processor=FileListEntityProcessor
   baseDir=***
   fileName=.*xml
   rootEntity=false
   dataSource=null 
  entity
 name=record
 processor=XPathEntityProcessor
 stream=false
 forEach=/record
 url=${f.fileAbsolutePath}
  field column=ID xpath=/record/@id /

  !-- Address  --
   entity
   name=record_adr
   processor=XPathEntityProcessor
   stream=false
   forEach=/record/address
   url=${f.fileAbsolutePath}
   field column=address_street
xpath=/record/address/@street /
   field column=address_state
xpath=/record/address//@state /
   field column=address_type
xpath=/record/address//@type /
  /entity
 /entity
   /entity
 /document
/dataConfig



I think the only way is to create a dynamic field for each attribute
(street, state etc.). Write a transformer to copy the fields from  
your data
config to appropriately named dynamic field (e.g. street_1,  
state_1, etc).

To maintain this counter you will need to get/store it with
Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

I cant't think of an easier way.
--
Regards,
Shalin Shekhar Mangar.





--
Regards,
Shalin Shekhar Mangar.




Re: How to make Relationships work for Multi-valued Index Fields?

2009-01-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
for searching you need to put them in a single field . use copyField
in schema.xml to achieve that

On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
chandrar...@apple.com wrote:
 I make this approach work with XPATH and XSL.   However, this approach
 creates multiple fields of like this

 address_state_1
 address_state_2
 ...
 address_state_10

 and

 credit_card_1
 credit_card_2
 credit_card_3


 How do I search for a credit_card.The query syntax does not seem to
 support wild cards in field names.   For e.g. I cant seem to do this -
 credit_card*:1234 4567 7890 1234

 On the search side I would not know how many credit card fields  got created
 for a document and so I need that to be dynamic.

 -g


 On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:

 Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.

 On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju 
 chandrar...@apple.com wrote:


 record
  coreInfo id=123 , .../
  address street=XYZ1 State=CA ...type=home /
  address street=XYZ2 state=CA ... type=Office/
  address street=XYZ3 state=CA type=Other/
 /record

 I have setup my DIH to treat these as entities as below

 dataConfig
  dataSource type=FileDataSource encoding=UTF-8 /
  document
   entity name =f processor=FileListEntityProcessor
   baseDir=***
   fileName=.*xml
   rootEntity=false
   dataSource=null 
  entity
 name=record
 processor=XPathEntityProcessor
 stream=false
 forEach=/record
 url=${f.fileAbsolutePath}
  field column=ID xpath=/record/@id /

  !-- Address  --
   entity
   name=record_adr
   processor=XPathEntityProcessor
   stream=false
   forEach=/record/address
   url=${f.fileAbsolutePath}
   field column=address_street
 xpath=/record/address/@street /
   field column=address_state
 xpath=/record/address//@state /
   field column=address_type
 xpath=/record/address//@type /
  /entity
 /entity
   /entity
  /document
 /dataConfig


 I think the only way is to create a dynamic field for each attribute
 (street, state etc.). Write a transformer to copy the fields from your
 data
 config to appropriately named dynamic field (e.g. street_1, state_1,
 etc).
 To maintain this counter you will need to get/store it with
 Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
 Context#setSessionAttribute(name, val, Context.SCOPE_DOC).

 I cant't think of an easier way.
 --
 Regards,
 Shalin Shekhar Mangar.




 --
 Regards,
 Shalin Shekhar Mangar.





-- 
--Noble Paul