Search and index Result

2011-04-14 Thread satya swaroop
Hi all,
   i just made a duplication  of solrdispatchfilter as
solrdispatchfilter1 and solrdispatchfilter2 such that all the /update or
/update/extract things are passed through the solrdispatchfilter1
and all search (/select)  things are passes through the
solrdispatchfilter2. It is because i need to establish a privacy concern for
the search result.
I need to check whether the required user has access to the particular files
or not.. it was success in implementing the privacy of results.
one major problem i am getting is after indexing some documents and
commiting it, i am not getting the commited data in the search result, i am
getting the old data that was before commit...
But i get the result only after restarting the server.. can anyone tell me
where to modify such that the search will give the results from the recent
commit...


Thanks and Regards,
satya


Re: how to set cookie for url requesting in stream_url

2011-04-08 Thread satya swaroop
Hi All,
 I was able to set the cookie value to the Stream_url connection, i was
able to pass the cookie value upto contentstreamBase.URLStream class and i
added
conn.setRequestProperty(Cookie,cookie[0].name=cookie[0].value) in the
connection setup.. and it is working fine now...

Regards,
satya


Fwd: how to set cookie for url requesting in stream_url

2011-04-01 Thread satya swaroop
HI Markus,
   I am using solr branch_3x, in tomcat web server
Regards,
satya


how to set cookie for url requesting in stream_url

2011-03-31 Thread satya swaroop
Hi All,
for indexing the documents in the other server i need to include a
cookie value in the url requesting through the stream_url.
can anybody tell me how to include the cookie in the url???
have anybody done this type??? or if there are any suggestions please tell
me???

ex:
http://localhost:8456/solr/update/extract?stream_url=remote_server_urlliteral.id=13748
;

here i need to include a cookie value while requesting for the
remote_server_url.


Regards,
satya


Solr coding

2011-03-23 Thread satya swaroop
Hi All,
  As for my project Requirement i need to keep privacy for search of
files so that i need to modify the code of solr,

for example if there are 5 users and each user indexes some files as
  user1 - java1, c1,sap1
  user2 - java2, c2,sap2
  user3 - java3, c3,sap3
  user4 - java4, c4,sap4
  user5 - java5, c5,sap5

   and if a user2 searches for the keyword java then it should be display
only  the file java2 and not other files

so inorder to keep this filtering inside solr itself may i know where to
modify the code... i will access a database to check the user indexed files
and then filter the result... i didnt have any cores.. i indexed all files
in a single index...

Regards,
satya


Re: Solr coding

2011-03-23 Thread satya swaroop
Hi Jayendra,
I forgot to mention the result also depends on the group of
user too It is some wat complex so i didnt tell it.. now i explain the
exact way..

  user1, group1 - java1, c1,sap1
  user2 ,group2- java2, c2,sap2
  user3 ,group1,group3- java3, c3,sap3
  user4 ,group3- java4, c4,sap4
  user5 ,group3- java5, c5,sap5

 user1,group1 means user1 belong to group1


Here the filter includes the group too.., if for eg: user1 searches for
java then the results should show as java1,java3 since java3 file is
acessable to all users who are related to the group1, so i thought of to
edit the code...

Thanks,
satya


Re: Solr coding

2011-03-23 Thread satya swaroop
Hi Jayendra,
  the group field can be kept if the no. of groups are
small... if a user may belong to 1000 groups in that case it would be
difficult to make a query???,   if a user changes the groups then we have to
reindex the data again...

ok i will try ur suggestion, if it can fulfill the needs then task will be
very easy...

Regards,
satya


solr indexing

2011-02-22 Thread satya swaroop
Hi all,
   to my keen intrest on solr indexing mechanism i started mining the
code of solr indexing (/update/extract), i read the indexing file formats,
scoring procedure, i have some queries regarding this..
1) the scoring is performed on the dynamic and precalculated value(doc
boost, field boost, lengthnorm). In calculating the score if suppose a term
in the index consits nearly one million docs then is solr calculating the
score for each and every doc present for the term and getting the top docs
from the index??? or is it undergoing any mechanism such that limiting the
calculation of score to only a particular docs???

If anybody know about it or any documentation regarding this please inform
me...


Regards,
satya


is solr dynamic calculation??

2011-02-17 Thread satya swaroop
Hi All,
 I have a query whether the solr shows the results of documents by
calculating the score on dynamic or is it pre calculating and supplying??..

for example:
if a query is made on q=solr in my index... i get a results of 25
documents... what is it calculating?? i am very keen to know its way of
calculation of score and ordering of results


Regards,
satya


Re: is solr dynamic calculation??

2011-02-17 Thread satya swaroop
Hi Markus,
As far i gone through the scoring of solr. The scoring is
done during searching on the use of boost values which were given during the
indexing.
I have a query now if i search for a keyword java then
1)if for a term named java in index contain 50,000 documents then do solr
calculate the score value for each and every document and filter them and
then sort it and   server results??? if it does the dynamic calculation
for each and every document then it takes a long time, but how can solr
reduced it??
 Am i right??? or if any wrong please tell me???

Regards,
satya


Re: spell suggest response

2011-01-17 Thread satya swaroop
Hi Grijesh,
   Though i use autosuggest i maynot get the exact results, the
order is not accurate.. As for example if i type
http://localhost:8080/solr/terms/?terms.fl=spellterms.prefix=solrterms.sort=indexterms.lower=solrterms.upper.incl=true
 i get results as...
solr
solr.amp
solr.datefield
solr.p
solr.pdf
   like that.But this may not lead to getting accurate results as we get in
spellchecking,

i require suggestions for any word irrespective of whether it is correct or
not, is there anything to be changed in solr to get suggestions as we get
when we type a wrong word in spellchecking... If so please let me know...

Regards,
satya


Re: spell suggest response

2011-01-17 Thread satya swaroop
Hi Grijesh,
i added both the termscomponent and spellcheck component to the
terms requesthandler, when i send a query as
http://localhost:8080/solr/terms?terms.fl=textterms.prefix=javarows=7omitHeader=truespellcheck=truespellcheck.q=javaspellcheck.count=20

the result i get is
response
-
lst name=terms
-
lst name=text
int name=java6/int
int name=javabas6/int
int name=javas6/int
int name=javascript6/int
int name=javac6/int
int name=javax6/int
/lst
/lst
-
lst name=spellcheck
lst name=suggestions/
/lst
/response



when i send this
http://localhost:8080/solr/terms?terms.fl=textterms.prefix=jawarows=5omitHeader=truespellcheck=truespellcheck.q=jawaspellcheck.count=20
i get the result as

response
-
lst name=terms
lst name=text/
/lst
-
lst name=spellcheck
-
lst name=suggestions
-
lst name=jawa
int name=numFound20/int
int name=startOffset0/int
int name=endOffset4/int
-
arr name=suggestion
strjava/str
straway/str
strjav/str
strjar/str
strara/str
strapa/str
strana/str
strajax/str


Now i need to know how to make ordering of the terms as in the 1st query the
result obtained is inorder and i want only javax, javac,javascript but not
javas,javabas how can it be done??

Regards,
satya


spellchecking even the key is true....

2011-01-17 Thread satya swaroop
Hi All,
can we get the spellchecking results even when the keyword is true.
As for spellchecking will give only to the wrong keywords, cant we get
similar and near words of the keyword though the spellcheck.q is true..
as an example
http://localhost:8080/solr/spellcheck?q=javaspellcheck=truespellcheck.count=5
the result will be

1)-
response
-
lst name=spellcheck
lst name=suggestions/
/lst
/response


can we get the result as
2)
response
-
lst name=spellcheck
lst name=suggestions
strjavax/str
strjavac/str
strjavabean/str
strjavascript/str
/lst
/response

NOTE:: all the keywords in the 2nd result is are in index...

Regards,
satya


Re: spell suggest response

2011-01-16 Thread satya swaroop
Hi Grijesh,
As you said you are implementing this type. Can you tell how
did you made in brief..

Regards,
satya


Re: spell suggest response

2011-01-12 Thread satya swaroop
Hi stefan,
I need the words from the index record itself. If java is given
then the relevant or similar or near words in the index should be shown.
Even the given keyword is true... can it be possible???


ex:-

http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.count=10
   In the o/p the suggestions will not be coming as
java is a word that spelt correctly...
  But cant we get near suggestions as javax,javacetc.., ???(the
terms in the index)

I read  about  suggester in solr wiki at
http://wiki.apache.org/solr/Suggester . But i tried to implement it but got
errors as

*error loading class org.apache.solr.spelling.suggest.suggester*

Regards,
satya


Re: spell suggest response

2011-01-12 Thread satya swaroop
Hi Juan,
 yeah.. i tried of onlyMorePopular and got some results but are
not similar words or near words to the word i have given in the query..
Here i state you the output..

http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.collate=truespellcheck.onlyMorePopular=truespellcheck.count=20

the o/p i get is
-arr name=suggestion
strdata/str
strhave/str
strcan/str
strany/str
strall/str
strhas/str
streach/str
strpart/str
strmake/str
strthan/str
stralso/str
/arr



but this words are not similar to the given word 'java' the near words
would be javac,javax,data,java.io... etc.., the stated words are present in
the index..


Regards,
satya


spell suggest response

2011-01-11 Thread satya swaroop
Hi All,
 can we get just suggestions only without the files response??
Here I state an example
when i query
http://localhost:8080/solr/spellcheckCompRH?q=java daka
usarspellcheck=truespellcheck.count=5spellcheck.collate=true

i get some result of java files and then the suggestions for the words
daka-data , usar-user. But actually i need only the spell suggestions.
But here time is getting consumed for displaying of files and then giving
spell suggestions. Cant we post a query to solr where we can get
the response as only spell suggestions???

Regards,
satya


Re: spell suggest response

2011-01-11 Thread satya swaroop
Hi Gora,
   I am using solr for file indexing and searching, But i have a
module where i dont need any files result but only the spell suggestions, so
i asked is der anyway in solr where i would get the spell suggestion
responses only.. I think it is clear for u now.. If not tell me I will try
to explain still furthur...

Regards,
satya


Re: spell suggest response

2011-01-11 Thread satya swaroop
Hi Stefan,
  Ya it works :). Thanks...
  But i have a question... can it be done only getting spell
suggestions even if the spelled word is correct... I mean near words to
it...
   ex:-

http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.count=10
   In the o/p the suggestions will not be coming as
java is a word that spelt correctly...
  But cant we get near suggestions as javax,javacetc.., ???

Regards,
satya


error in html???

2010-12-23 Thread satya swaroop
Hi All,

 I am able to get the response in the success case in json format by
stating wt=json in the query. But as in case if any errors i am geting in
html format.
 1) Is there any specified reason to get in html format??
  2)cant we get the error result in json format??

Regards,
satya


Re: error in html???

2010-12-23 Thread satya swaroop
Hi Erick,
   Every result comes in xml format. But when you get any errors
like http 500 or http 400 like wise we will get in html format. My query is
cant we make that html file into json or vice versa..

Regards,
satya


Different Results..

2010-12-22 Thread satya swaroop
Hi All,
 i am getting different results when i used with some escape keys..
for example:::
1) when i use this request
http://localhost:8080/solr/select?q=erlang!ericson
   the result obtained is
   result name=response numFound=1934 start=0

2) when the request is
 http://localhost:8080/solr/select?q=erlang/ericson
the result is
  result name=response numFound=1 start=0


My query here is, do solr consider both the queries differently and what do
it consider for !,/ and all other escape characters.


Regards,
satya


Re: Google like search

2010-12-16 Thread satya swaroop
Hi All,

 Thanks for your suggestions.. I got the result of what i expected..

Cheers,
Satya


Testing Solr

2010-12-16 Thread satya swaroop
Hi All,

 I built solr successfully and i am thinking to test it  with nearly
300 pdf files, 300 docs, 300 excel files,...and so on of each type with 300
files nearly
 Is there any dummy data available to test for solr,Otherwise i need to
download each and every file individually..??
Another question is there any Benchmarks of solr...??

Regards,
satya


Google like search

2010-12-14 Thread satya swaroop
Hi All,
 Can we get the results like google  having some data  about the
search... I was able to get the data that is the first 300 characters of a
file, but it is not helpful for me, can i be get the data that is having the
first found key in that file

Regards,
Satya


Re: Google like search

2010-12-14 Thread satya swaroop
Hi Tanguy,
  I am not asking for highlighting.. I think it can be
explained with an example.. Here i illustarte it::

when i post the query like dis::

http://localhost:8080/solr/select?q=Javaversion=2.2start=0rows=10indent=on

i Would be getting the result as follows::

-response
-lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
-result name=response numFound=1 start=0
-doc
str name=filenameJava%20debugging.pdf/str
str name=id122/str
-arr name=text1
-str
Table of Contents
If you're viewing this document online, you can click any of the topics
below to link directly to that section.
1. Tutorial tips 2
2. Introducing debugging  4
3. Overview of the basics 6
4. Lessons in client-side debugging 11
5. Lessons in server-side debugging 15
6. Multithread debugging 18
7. Jikes overview 20
/str
/arr
/doc
/result
/response

Here the str field contains the first 300 characters of the file as i kept a
field to copy only 300 characters in schema.xml...
But i dont want the content like dis.. Is there any way to make an o/p as
follows::

str Java is one of the best language,java is easy to learn.../str


where this content is at start of the chapter,where the first word of java
is occured in the file...


Regards,
Satya


Re: Google like search

2010-12-14 Thread satya swaroop
Hi Tanguy,
 Thanks for ur reply. sorry to ask this type of question.
how can we index each chapter of a file as seperate document.As for i know
we just give the path of file to solr to index it... Can u provide me any
sources for this type... I mean any blogs or wiki's...

Regards,
satya


Re: RAM increase

2010-10-29 Thread satya swaroop
Hi All,

 Thanks for your reply.I have a doubt whether to increase the ram or
heap size to java or to tomcat where the solr is running


Regards,
satya


Re: solr result....

2010-10-28 Thread satya swaroop
Hi Lance,
  I actually copied tika exceptions in one html file and indexed
it. It is just a content of a file and here i tell u  what i mean::


if i post a query like *java* then the result or response from solr should
hit only a part of the content like as follows::

http://localhost:8456/solr/select/?q=javaversion=2.2start=10rows=10indent=on

-response
-lst name=responseHeader
int name=status0/int
int name=QTime453/int
/lst
-result name=response numFound=62 start=10
-doc
-arr name=content_type
strapplication/pdf/str
/arr
str name=idjavaebuk/str
date name=last_modified2001-07-02T11:54:10Z/date
-arr name=text
-str

A Java program with two main methods  The following is an example of a java
program with two main methods with different signatures.
Program 3
public class TwoMains
{
/** This class has two main methods with
* different signatures */
public static void main (String args[])  .
  /str
 /arr
/doc.

/response




the doc in the result should not contain the entire content of a file. It
should have only a part of the content.The content should be the first hit
of the word java in that file...


Regards,
satya


solr result....

2010-10-27 Thread satya swaroop
Hi ,
  Can the result of solr show the only a part of the content of a
document that got in the result.
example

if i send a query for to search tika then the result should be as follows:::

response
-lst name=responseHeader
int name=status0/int
int name=QTime79/int
/lst
-result name=response numFound=62 start=0
doc
-arr name=content_type
   strtext/html/str
/arr
 str name=id1html/str
-arr name=text
-str
   Apache Tomcat/6.0.26 - Error reportHTTP Status 500 -
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@cc9d70

org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@cc9d70
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:214)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)...

 /str
   /arr
/doc


The result should not show the entire content of a file. It should show up
only a part of the content where the query word is present..As like the
google result and like search result in the lucidimagionation

Regards,
satya


RAM increase

2010-10-20 Thread satya swaroop
Hi all,
  I increased my RAM size to 8GB and i want 4GB of it to be used
for solr itself. can anyone tell me the way to allocate the RAM for the
solr.


Regards,
satya


solr requirements

2010-10-18 Thread satya swaroop
Hi All,
I am planning to have a separate server for solr and regarding
hardware requirements i have a doubt about what configuration to be needed.
I know it will be hard to tell but i just need a minimum requirement for the
particular situation as follows::


1) There are 1000 regular users using solr and Every day each user indexes
10 files of 1KB each and totally it leads to a size of 10MB for a day and it
goes on...???

2)How much of RAM is used by solr in genral???

Thanks,
satya


Re: solr requirements

2010-10-18 Thread satya swaroop
Hi,
   here is some more info about it. I use Solr to output only the file
names(file id's). Here i enclose the fields in my schema.xml and presently i
have only about 40MB of indexed data.


   field name=id type=string indexed=true stored=true
required=true /
   field name=sku type=textTight indexed=true stored=false
omitNorms=true/
   field name=name type=textgen indexed=true stored=false/

   field name=manu type=textgen indexed=true stored=false
omitNorms=true/
   field name=cat type=text_ws indexed=true stored=false
multiValued=true omitNorms=true /
   field name=features type=text indexed=true stored=false
multiValued=true/
   field name=includes type=text indexed=true stored=false
termVectors=true termPositions=true termOffsets=true /

   field name=weight type=float indexed=true stored=false/
   field name=price  type=float indexed=true stored=false/
   field name=popularity type=int indexed=true stored=false /
   field name=inStock type=boolean indexed=true stored=false /

   !--
   The following store examples are used to demonstrate the various ways one
might _CHOOSE_ to
implement spatial.  It is highly unlikely that you would ever have ALL
of these fields defined.
--
   field name=store type=location indexed=true stored=false/
   field name=store_lat_lon type=latLon indexed=true stored=false/
   field name=store_hash type=geohash indexed=true stored=false/


   !-- Common metadata fields, named specifically to match up with
 SolrCell metadata when parsing rich documents such as Word, PDF.
 Some fields are multiValued only because Tika currently may return
 multiple values for them.
   --
   field name=title type=text indexed=true stored=true
multiValued=true/
   field name=subject type=text indexed=true stored=false/
   field name=description type=text indexed=true stored=false/
   field name=comments type=text indexed=true stored=false/
   field name=author type=textgen indexed=true stored=false/
   field name=keywords type=textgen indexed=true stored=false/
   field name=category type=textgen indexed=true stored=false/
   field name=content_type type=string indexed=true stored=false
multiValued=true/
   field name=last_modified type=date indexed=true stored=false/
   field name=links type=string indexed=true stored=false
multiValued=true/
!-- added here content satya--
   field name=content type=spell indexed=true stored=false
multiValued=true/


   !-- catchall field, containing all other searchable text fields
(implemented
via copyField further on in this schema  --
   field name=text type=text indexed=true stored=false
multiValued=true termVectors=true/

   !-- catchall text field that indexes tokens both normally and in reverse
for efficient
leading wildcard queries.  here satya--
   field name=text_rev type=text_rev indexed=true stored=false
multiValued=true/

   !-- non-tokenized version of manufacturer to make it easier to sort or
group
results by manufacturer.  copied from manu via copyField here
satya--
   field name=manu_exact type=string indexed=true stored=false/
   field name=spell type=spell indexed=true stored=false
multiValued=true/
!-- heere changed --
   field name=payloads type=payloads indexed=true stored=false/

 field name=timestamp type=date indexed=true stored=false
default=NOW multiValued=false/



Regards,
satya


ant build problem

2010-10-04 Thread satya swaroop
Hi all,
i updated my solr trunk to revision 1004527. when i go for compiling
the trunk with ant i get so many warnings, but the build is successful. the
warnings are here:::
common.compile-core:
[mkdir] Created dir:
/home/satya/temporary/trunk/lucene/build/classes/java
[javac] Compiling 475 source files to
/home/satya/temporary/trunk/lucene/build/classes/java
[javac] warning: [path] bad path element
/usr/share/ant/lib/hamcrest-core.jar: no such file or directory
[javac]
/home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/queryParser/QueryParserTokenManager.java:455:
warning: [cast] redundant cast to int
[javac]  int hiByte = (int)(curChar  8);
[javac]   ^
[javac]
/home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/queryParser/QueryParserTokenManager.java:705:
warning: [cast] redundant cast to int
[javac]  int hiByte = (int)(curChar  8);
[javac]   ^
[javac]
/home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/queryParser/QueryParserTokenManager.java:812:
warning: [cast] redundant cast to int
[javac]  int hiByte = (int)(curChar  8);
[javac]   ^
[javac]
/home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/queryParser/QueryParserTokenManager.java:983:
warning: [cast] redundant cast to int
[javac]  int hiByte = (int)(curChar  8);
[javac]   ^
[javac]
/home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/search/FieldCacheImpl.java:209:
warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: T
[javac] key.creator.validate( (T)value, reader);
[javac]  ^
[javac]
/home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/search/FieldCacheImpl.java:278:
warning: [unchecked] unchecked call to
Entry(java.lang.String,org.apache.lucene.search.cache.EntryCreatorT) as a
member of the raw type org.apache.lucene.search.FieldCacheImpl.Entry
[javac] return (ByteValues)caches.get(Byte.TYPE).get(reader, new
Entry(field, creator));
ptionList.addAll(exceptions);

||

[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files additionally use unchecked or unsafe
operations.
[javac] 100 warnings

BUILD SUCCESSFUL
Total time: 19 seconds


here i placed only the starting stage of warnings.
After the compiling i thought to check with the ant test and performed but
it is failed..

i didnt find any hamcrest-core.jar in my ant library
i use ant 1.7.1


Regards,
satya


ant package

2010-09-21 Thread satya swaroop
Hi all,
i want to build the package of my solr and i found it can be done
using ant. When i type ant package in solr module i get an error as:::\


sa...@swaroop:~/temporary/trunk/solr$ ant package
Buildfile: build.xml

maven.ant.tasks-check:

BUILD FAILED
/home/satya/temporary/trunk/solr/common-build.xml:522:
##
  Maven ant tasks not found.
  Please make sure the maven-ant-tasks jar is in ANT_HOME/lib, or made
  available to Ant using other mechanisms like -lib or CLASSPATH.
  ##

Total time: 0 seconds


can anyone tell me the procedure to build it or give any information about
it..

Regards,
satya


Re: ant package

2010-09-21 Thread satya swaroop
HI ,
  ya i dont have the jar file in the ant/lib where can i get the jar
file or wat is the procedure to make that maven-artifact-ant-2.0.4-dep.jar??

regards,
satya


Re: ant package

2010-09-21 Thread satya swaroop
Hi erick,
 thanks for reply and i got the jar file downloaded and kept it
in ant library
now when i make ant package command it getting error in the middle of build
in generate-maven-artifacts... and the error is

sa...@geodesic-desktop:~/temporary/trunk/solr$ sudo  ant  package
---
---
---
generate-maven-artifacts:
[mkdir] Created dir: /home/satya/temporary/trunk/solr/build/maven
[mkdir] Created dir: /home/satya/temporary/trunk/solr/dist/maven
 [copy] Copying 1 file to
/home/satya/temporary/trunk/solr/build/maven/src/maven
[artifact:install-provider] Installing provider:
org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2

BUILD FAILED
/home/satya/temporary/trunk/solr/build.xml:853: The following error occurred
while executing this line:
/home/satya/temporary/trunk/solr/common-build.xml:373: artifact:deploy
doesn't support the uniqueVersion attribute

Total time: 1 minute 51 seconds
sa...@desktop:~/temporary/trunk/solr$

Regards,
satya


SolrCloud new....

2010-09-20 Thread satya swaroop
Hi all,
I  am having 4 instances of solr in 4 systems.Each system has a
single instance of solr.. I want the result from all these servers. I came
to know using of solrcloud. I read about it and worked on the example and it
was working as given in wiki.
I am using solr 1.4 and apache tomcat. In order to implement cloud in the
solr trunk wat procedure should be followed.
1)Should i copy the libraries from cloud to trunk???
2)should i keep the cloud module in every system???
3) I am not using any cores in the solr. It is a single solr in every
system.can solrcloud support it??
4) the example is given in jetty.Is it the same way to make it in tomcat???

Regards,
satya


cloud or zookeeper

2010-09-15 Thread satya swaroop
Hi All,
   What is the difference of using shards,solr cloud and zookeeper..
which is the best way to scale the solr..
 I need to reduce the index size in every system and reduce the search time
for a query...

Regards,
satya


Re: stream.url

2010-09-08 Thread satya swaroop
Hi Hoss,

 Thanks for reply and it got working The reason was as you
said i was not double escaping i used %2520 for whitespace and it is
working now

Thanks,
satya


Re: stream.url

2010-09-03 Thread satya swaroop
Hi all,

  I am unable to index the files of remote system that contains escaped
characters in  their file names i think there is a problem in solr for
indexing the files of escaped characters in remote system...
Has anybody tried to index the files in remote system that contain the
escaped characters But solr is working good for files that has no
escaped characters in their name.


I sent the request through the curl by encoding the filename in url format
but the problem is same...

Regards,
satya


stream.url

2010-09-02 Thread satya swaroop
Hi all,

  I am using stream.url to index the files in the remote system. when i
use the url as
1) curl 
http://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws?file=yaws_presentation.pdfliteral.id=schb4

it works and i get the response as the file got indexed.

but when i use
2) curl 
http://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws?file=solr;
apache.pdf
literal.id=schb5
i get the error in the solr... i replaced the escaped characters with %20
for space and %26 for , but the error is same saying

Unexpected end of file from server java.net.SocketException..

when i used without solr as http://remotehost:port/file_download.yaws?file=solr
 apache.pdf then i get the file downloaded to my system.

I here enclose the entire error=

HTTP Status 500 - Unexpected end of file from server
java.net.SocketException: Unexpected end of file from server at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368)
at java.security.AccessController.doPrivileged(Native Method) at
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016)
at
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:88)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:169)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:57)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:133)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1355) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:340)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619) Caused by:
java.net.SocketException: Unexpected end of file from server at
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) at
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:766) at
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
at
sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2173)
at java.net.URLConnection.getContentType(URLConnection.java:485) at
org.apache.solr.common.util.ContentStreamBase$URLStream.init(ContentStreamBase.java:81)
at
org.apache.solr.servlet.SolrRequestParsers.buildRequestFrom(SolrRequestParsers.java:138)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:226)
... 12 more


can anybody provide information regarding this??


Regards,
Satya


Re: stream.url

2010-09-02 Thread satya swaroop
Hi stefan,
   I used escape charaters and made it... It is not problem for
a single file of 'solr apache' but it shows the same problem for the files
like Wireless lan.ppt, Tom info.pdf.

the curl i sent is::

curl 
http://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws%3Ffile=solrhttp://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws?file=solr
%20%26%20apache.pdfliteral.id=schb5

Regards,
satya


Re: stream.url

2010-09-02 Thread satya swaroop
Hi,
I made the curl from the shell(command prompt or terminal) with the
escaping characters but the error is same when i saw in the remote
system the request is not getting there Is there anything to be changed
in config file inorder to enable the escaping characters for stream.url

Did anybody try indexing files in remote system through stream.url,  where
the files name contain escape characters like ,space

regards,
satya


solr working...

2010-08-26 Thread satya swaroop
Hi all,
  I am intrested to see the working of solr.
1)Can anyone tell me how to start with to know its working 

Regards,
satya


Re: solr working...

2010-08-26 Thread satya swaroop
Hi peter,
I am already working on solr and it is working good. But i want
to understand the code and know where the actual working is going on, and
how indexing is done and how the requests are parsed and how it is
responding and all others. TO understand the  code i asked how to start???

Regards,
satya


Re: solr working...

2010-08-26 Thread satya swaroop
Hi all,

  Thanks for ur response and information. I used slf4j log and i kept
log.info method in every class of solr module to know which classes get
invoke on particular requesthandler or on start of solr I was able to
keep it only in solr Module but not in lucene module... i get error when i
use it in dat module.. can any one tell me other ways like this to track the
path solr

Regards,
  satya


reduce the content???

2010-08-25 Thread satya swaroop
Hi all,
  i indexed nearly 100 java pdf files which are of large size(min 1MB).
The solr is showing the results with the entire content that it indexed
which is taking time to show the results.. cant we reduce the content it
shows or can i just have the file names and ids instead of the entire
content in the results

Regards,
satya


Re: stream.url problem

2010-08-24 Thread satya swaroop

 Hi all,
 I got the solution for my problem. I changed my port number and i
 kept the old one in the stream.url... so problem was that...
 thanks all

 Now i got another problem, it is when i send any requests to remote
 system for the files that have names with escape characters like  ,space
 . For example= TomJerry.pdf  i get a problem as Unexpected end of
 file from server...

 the request i sent is::

 curl 
 http://localhost:8080/solr/update/extract?stream.url=http://remotehost:8011/file_download.yaws?file=Wireless%20Lan.pdfliteral.id=su8
 

 here file_download.yaws is a module that fetches the file and gives to
 solr.

 solr is able to index the files that doesnt contain the escape characters
 in the remote system.. example:: apache.txt, solr_apache.pdf

 the error i got is:::

 HTTP Status 500 - Unexpected end of file from server
 java.net.SocketException: Unexpected end of file from server at
 sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at
 sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368)
 at java.security.AccessController.doPrivileged(Native Method) at
 sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362)
 at
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016)
 at
 org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:88)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:161)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:57)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:133)
 at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1355) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:340)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:619) Caused by:
 java.net.SocketException: Unexpected end of file from server at
 sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) at
 sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at
 sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:766) at
 sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
 at
 sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2173)
 at java.net.URLConnection.getContentType(URLConnection.java:485) at
 org.apache.solr.common.util.ContentStreamBase$URLStream.init(ContentStreamBase.java:81)
 at
 org.apache.solr.servlet.SolrRequestParsers.buildRequestFrom(SolrRequestParsers.java:138)
 at
 org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:117)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:226)
 ...




Regards,
 satya


/update/extract

2010-08-19 Thread satya swaroop
Hi all,
   when we handle extract request handler what class gets invoked.. I
need to know the navigation of classes when we send any files to solr.
can anybody tell me the classes or any sources where i can get the answer..
or can anyone tell me what classes get invoked when we start the
solr... I be thankful if anybody can help me with regarding this..

Regards,
satya


solr working...

2010-08-18 Thread satya swaroop
hi all,
i am very intrested to know the working of solr. can anyone tell me
which modules or classes that gets invoked when we start the servlet
container like tomcat or when we send any requests to solr like sending pdf
files or what files get invoked at the start of solr.??

regards,
satya


stream.url problem

2010-08-17 Thread satya swaroop
hi all,
   i am indexing the documents to solr that are in my system. now i need
to index the files that are in remote system, i enabled the remote streaming
to true in solrconfig.xml and when i use the stream.url it shows the error
as connection refused and the detail of the error is:::

when i sent the request in my browser as::

http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2

i get the error as

HTTP Status 500 - Connection refused java.net.ConnectException: Connection
refused at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown
Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368)
at java.security.AccessController.doPrivileged(Native Method) at
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016)
at
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:88)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:161)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619) Caused by:
java.net.ConnectException: Connection refused at
java.net.PlainSocketImpl.socketConnect(Native Method) at
java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at
java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at
java.net.Socket.connect(Socket.java:525) at
java.net.Socket.connect(Socket.java:475) at
sun.net.NetworkClient.doConnect(NetworkClient.java:163) at
sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at
sun.net.www.http.HttpClient.openServer(HttpClient.java:529) at
sun.net.www.http.HttpClient.init(HttpClient.java:233) at
sun.net.www.http.HttpClient.New(HttpClient.java:306) at
sun.net.www.http.HttpClient.New(HttpClient.java:323) at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:860)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801)
at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
at
sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2173)
at java.net.URLConnection.getContentType(URLConnection.java:485) at
org.apache.solr.common.util.ContentStreamBase$URLStream.init(ContentStreamBase.java:81)
at
org.apache.solr.servlet.SolrRequestParsers.buildRequestFrom(SolrRequestParsers.java:136)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:116)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
...


if any body know
please help me with this

regards,
satya


Re: indexing???

2010-08-17 Thread satya swaroop
hi,

1) i use tika 0.8...

2)the url is  https://issues.apache.org/jira/browse/PDFBOX-709 and the
file is samplerequestform.pdf

 3)the entire error is::;
curl 
http://localhost:8080/solr/update/extract?stream.file=/home/satya/my_workings/satya_ebooks/8-Linux/samplerequestform.pdfliteral.id=linuxc




  htmlheadtitleApache Tomcat/6.0.26 - Error
report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 500 - org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@1d688e2

org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@1d688e2
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:214)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.pdf.pdfpar...@1d688e2
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:144)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)
... 18 more
Caused by: java.lang.ClassCastException:
org.apache.pdfbox.pdmodel.font.PDFontDescriptorAFM cannot be cast to
org.apache.pdfbox.pdmodel.font.PDFontDescriptorDictionary
at
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:167)
at
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.lt;initgt;(PDTrueTypeFont.java:117)
at
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:140)
at
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76)
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:225)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
... 21 more
/h1HR size=1 noshade=noshadepbtype/b Status
report/ppbmessage/b uorg.apache.tika.exception.TikaException:
Unexpected RuntimeException from

Re: indexing???

2010-08-16 Thread satya swaroop
hi all,
   the error i got is Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@8210fc when i indexed a file similar
to the one in
   https://issues.apache.org/jira/browse/PDFBOX-709/samplerequestform.pdfcant
we index those type files in solr???

regards,
satya


indexing???

2010-08-12 Thread satya swaroop
Hi all,
   The indexing part of solr is going good,but i got a error on indexing
a single pdf file. when i searched for the error in the mailing list i found
that the error was due to copyright of that file. can't we index a file
which has copy rights or any digital rights???

regards,
  satya


spell checking problem

2010-07-29 Thread satya swaroop
hi all,
  i need some help in spellchecking.i configured my solrconfig and
schema by looking the usermailing list and here i give you the configuration
i made..

my schema.xml::

 fieldType name=spellText class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 field name=spell type=spellText indexed=true stored=true
multiValued=true/

copyField source=* dest=spell/



my solrconfig.xml:
--
  requestHandler name=spellchecker class=solr.SearchHandler
startup=lazy
lst name=defaults
  str name=spellcheck.dictionarydefault/str
  str name=spellcheck.onlyMorePopularfalse/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count5/str

/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler



 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypespellText/str

lst name=spellchecker
  str name=namedefault/str
  str name=fieldname/str   !-- the default field in
solrconfig if i change to spell field then the dictionary is not created
--
  str name=spellcheckIndexDir./spell/str
  str name=buildOnCommittrue/str
/lst

!-- a spellchecker that uses a different distance measure--
lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldspell/str
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellcheckerjaro/str
/lst


  /searchComponent




1)the problem here is for the default dictionary the index is getting
created and if i write jawa the suggestions it gives are data,sata.. but
the actual sugest is java. I nearly have 20 java docs indexed
2)another problem ::: if i make build to jarowinkler dictionary which is
using the spell field is not going to create the dictionary and i only see
segments.gen and segments_1 in its directory


regards,
satya


spell checking....

2010-07-26 Thread satya swaroop
hi all,
i am a new one to solr and able to implement indexing the documents
by following the solr wiki. now i am trying to add the spellchecking. i
followed the spellcheck component in wiki but not getting the suggested
spellings. i first build it by spellcheck.build=true,...

here i give u the example:::

http://localhost:8080/solr/spell?q=javsspellcheck=truespellcheck.collate=true

response

-
/result

lst name=spellcheck
lst name=suggestions/
/lst
/response


here the response should actualy suggest the java but didnt..

can any one guide me about it...
 i am using solr 1.4, tomcat in ubuntu





Regards,
swarup


Re: spell checking....

2010-07-26 Thread satya swaroop
This is in solrconfig.xml:::

searchComponent name=spellcheck class=solr.SpellCheckComponent
  lst name=spellchecker
  str name=namedefault/str

  str name=classnamesolr.IndexBasedSpellChecker/str

  str name=fieldspell/str
   str name=spellcheckIndexDir./spellchecker/str
   str name=accuracy0.7/str
 str name=buildOnCommittrue/str
str name=buildOnOptimizetrue/str
/lst

lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldlowerfilt/str
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellchecker/str
  str name=buildOnCommittrue/str
  str name=buildOnOptimizetrue/str
/lst

  str name=queryAnalyzerFieldTypetextSpell/str
/searchComponent

!--
  The SpellingQueryConverter to convert raw (CommonParams.Q) queries into
tokens.  Uses a simple regular expression
  to strip off field markup, boosts, ranges, etc. but it is not guaranteed
to match an exact parse from the query parser.

  Optional, defaults to solr.SpellingQueryConverter
--
queryConverter name=queryConverter
class=org.apache.solr.spelling.SpellingQueryConverter/


 i added the following in standard request handler::

requestHandler name=standard class=solr.SearchHandler default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   !--
   int name=rows10/int
   str name=fl*/str
   str name=version2.1/str
!-- Optional, must match spell checker's name as defined above,
defaults to default --
  str name=spellcheck.dictionarydefault/str
  !-- omp = Only More Popular --
  str name=spellcheck.onlyMorePopularfalse/str
  !-- exr = Extended Results --
  str name=spellcheck.extendedResultsfalse/str
  !--  The number of suggestions to return --
  str name=spellcheck.count1/str
/lst
 arr name=last-components
  strspellcheck/str
/arr

  /requestHandler


Re: problem with storing??

2010-07-18 Thread satya swaroop
hi all,
   now solr is working good.i am working in ubuntu and i was indexing
the documents which dont hav permissions . so the problem was that. i thank
all of u for ur reply to my queries.
  thanking you,
   satya


Re: no response

2010-07-16 Thread satya swaroop
hi,
   i am sorry the mail u sent was in sent mail... I didnt look it I am
going to check now.. I will definetely tell u the entire thing

regards,
  satya


Re: problem with storing??

2010-07-16 Thread satya swaroop
hi,
I checked out the admin page and it is indexing for others.In the log
files i dont get anything when i send the documents. I checked out the log
in catalina(tomcat). I changed the dismax handler from q=*:* to q=   . I
atleast get the response when i send pdf/html files but dont even get for
the doc files


regards,
  swaroop


problem with storing??

2010-07-15 Thread satya swaroop
Hi all,
   i am new to solr and i followed d wiki and got everything going
right. But when i send any html/txt/pdf documents the response is as
follows:::

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime576/int/lst
/response

but when i search in the solr i dont find the result can any one tell me
what to be done..??
The curl i used for the above o/p is

curl '
http://localhost:8080/solr/update/extract?literal.id=doc1000commit=truefmap.content=text'
-F myfi...@java.pdf

regards,
 satya


Re: problem with storing??

2010-07-15 Thread satya swaroop
hi,
   i sent the commit after adding the documents. but the problem is same

regards,
  satya


no response

2010-07-15 Thread satya swaroop
Hi all,
i Have a problem with the solr. when i send the documents(.doc) i am
not getting the response.
  example:
 sa...@geodesic-desktop:~/Desktop$  curl 
http://localhost:8080/solr/update/extract?stream.file=/home/satya/Desktop/InvestmentDecleration.docstream.contentType=application/msword;
literal.id=Invest.doc
sa...@geodesic-desktop:~/Desktop$


could any body tell me what to do??


Re: indexing rich documents

2010-07-14 Thread satya swaroop
ya i checked the extraction request handler but couldnt get the
info... i installed tika-0.7 and copied the jar files into the solr
home library.. i started sending the pdf/html files then i get a lazy
error. i am using tomcat and solr 1.4


Re: indexing with pdf files problem

2010-07-13 Thread satya swaroop
hi,
   I installed tika and made its jar files into solr home library and also
gave the path to the tika configuration file. But the error is same.  the
tika config file is as follows:::


?xml version=1.0 encoding=UTF-8?
properties
mimeTypeRepository
resource=/opt/tika-0.7/tika-core/target/classes/org/apache/tika/mime/tika-mimetypes.xml
magic=false/

parsers
 parser name=text-xml class=org.apache.tika.parser.xml.XMLParser
 namespacehttp://purl.org/dc/elements/1.1//namespace
 mimeapplication/xml/mime
extract
 content name=title xpathSelect=//dc:title/
 content name=subject xpathSelect=//dc:subject/
content name=creator xpathSelect=//dc:creator/
 content name=description xpathSelect=//dc:description/
 content name=publisher xpathSelect=//dc:publisher/
 content name=contributor xpathSelect=//dc:contributor/
 content name=type xpathSelect=//dc:type/
 content name=format xpathSelect=//dc:format/
 content name=identifier xpathSelect=//dc:identifier/
 content name=language xpathSelect=//dc:language/
 content name=rights xpathSelect=//dc:rights/
 content name=outLinks
 regexSelect
 ![CDATA[

([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?)
 ]]
 /regexSelect
 /content
 /extract
 /parser
 parser name=parse-msword
class=org.apache.tika.parser.msword.MsWordParser

 mimeapplication/msword/mime
 extract
 content name=fullText textSelect=fullText/
 content name=outLinks
 regexSelect
 ![CDATA[

([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?)
 ]]
 /regexSelect
 /content
 /extract

 /parser
 parser name=parse-msexcel
class=org.apache.tika.parser.msexcel.MsExcelParser

 mimeapplication/vnd.ms-excel/mime
 extract
 content name=fullText textSelect=fullText/
 content name=outLinks
 regexSelect
 ![CDATA[

([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?)
 ]]
 /regexSelect
 /content
 /extract

 /parser
 parser name=parse-mspowerpoint
class=org.apache.tika.parser.mspowerpoint.MsPowerPointParser

 mimeapplication/vnd.ms-powerpoint/mime
 extract
 content name=fullText textSelect=fullText/
 content name=title textSelect=title/
 content name=author textSelect=author/
 content name=subject textSelect=subject/
 content name=outLinks
 regexSelect
 ![CDATA[

([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?)
 ]]
 /regexSelect
 /content
 /extract

 /parser
 parser name=parse-html
class=org.apache.tika.parser.html.HtmlParser

mimetext/html/mime
 mimeapplication/x-asp/mime
 extract
 content name=fullText textSelect=fullText/
 content name=title textSelect=title/
 content name=outLinks
 regexSelect
 ![CDATA[

([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?)
 ]]
 /regexSelect
 /content
 /extract

 /parser
 !--

 parser name=parse-html
class=org.apache.tika.parser.html.NekoHtmlParser

 mimetext/html/mime
 mimeapplication/x-asp/mime
 .
 extract
 content name=fullText xpathSelect=//*/
 content name=title xpathSelect=//title/
 content name=outLinks
 regexSelect
 ![CDATA[

([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?)
 ]]
 /regexSelect
 /content
 /extract

 /parser

 --
 parser mame=parse-rtf class=org.apache.tika.parser.rtf.RTFParser

 mimeapplication/rtf/mime
 extract
 content name=fullText textSelect=fullText/
 content name=outLinks
 regexSelect
 ![CDATA[

([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?)
 ]]
 /regexSelect
 /content
 /extract

 /parser
 parser name=parse-pdf class=org.apache.tika.parser.pdf.PDFParser

 mimeapplication/pdf/mime
 extract
 content name=fullText textSelect=fullText/
 content name=title textSelect=title/
 content name=author textSelect=author/
 content name=creator textSelect=creator/
 content name=summary textSelect=summary/
 content name=keywords textSelect=keywords/
 content name=producer textSelect=producer/
content name=subject textSelect=subject/
 content name=trapped textSelect=trapped/
 content name=creationDate textSelect=creationDate/
 

indexing rich documents

2010-07-13 Thread satya swaroop
Hi all,
 i am new to solr and followed with the wiki and got the solr admin
run sucessfully. It is good going for xml files. But to index the rich
documents i am unable to get it. I followed wiki to make the richer
documents also,  but i didnt get it.The error comes when i send an pdf/html
file is a lazy error. can anyone give some detail description about how to
make richer documents indexable
 i use tomcat and working in ubuntu. The home directory for solr is
/opt/solr/example and catalina home is /opt/tomcat6.


thanks  regards,
 swaroop


indexing rich documents

2010-07-13 Thread satya swaroop
Hi all,
 i am new to solr and followed with the wiki and got the solr admin
run sucessfully. It is good going for xml files. But to index the rich
documents i am unable to get it. I followed wiki to make the richer
documents also,  but i didnt get it.The error comes when i send an pdf/html
file is a lazy error. can anyone give some detail description about how to
make richer documents indexable
 i use tomcat and working in ubuntu. The home directory for solr is
/opt/solr/example and catalina home is /opt/tomcat6.


thanks  regards,
 swaroop


Re: indexing rich documents

2010-07-13 Thread satya swaroop
hi,
yes i followed the wiki and can now tell me the procedure for it
  regards,
   swaroop


indexing with pdf files problem

2010-07-12 Thread satya swaroop
hi all,
  i am working with solr on tomcat. the indexing is good for xml files
but when i send the docs or html files or pdf's through curl i get the error
as lazy error. can u telll me the way. the output is as follows when i send
a pdf file  i am working in ubuntu. solr home is /opt/example
  tomcat is /opt/tomcat6


htmlheadtitleApache Tomcat/6.0.26 - Error report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 500 - lazy loading error

org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException:
java.lang.NullPointerException
at
org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:76)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
... 16 more
Caused by: java.lang.NullPointerException
at
org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:73)
at
org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:99)
at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:84)
at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:61)
at
org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:74)
... 17 more
/h1HR size=1 noshade=noshadepbtype/b Status
report/ppbmessage/b ulazy loading error

org.apache.solr.common.SolrException: lazy loading error
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at