date:20110225

Tomcat EXE Source Code

2011-02-25 Thread rajini maski

  Can anybody help me to get the source code of the Tomcat exe
file i.e, source code of the installation exe .

Thanks..

Re: CUSTOM JSP FOR APACHE SOLR

2011-02-25 Thread Paul Libbrecht


From looking at the source, I see only the following option available for me to 
write search results displaying jsp's: adjust SolrDispatchFilter to treat a 
JspResponseWriter specially by:
- enriching the http-request with the search queries and reponses
- forward the request down the chain

It sounds bit difficult a modification of SolrDispatchFilter but it sounds to 
me to be useful (I would get all the webapp services in my jsp, e.g. for i18n, 
sessions, ...).

It would also allow other implementations such as servlets.

Thanks in advance to advice if it's a good or bad thing to do.

paul


Le 24 févr. 2011 à 23:47, Paul Libbrecht a écrit :

 Hello list,
 
 as suggested below, I tried to implement a custom ResponseWriter that would 
 evaluate a JSP but that seems impossible: the HttpServletRequest and the 
 HttpServletResponse are not available anymore.
 
 Have I missed something?
 Should I rather do a RequestHandler?
 Does anyone know an artificial way to run a JSP? (I rather not like it).
 
 thanks in advance
 
 paul
 
 
 Le 2 févr. 2011 à 20:42, Tomás Fernández Löbbe a écrit :
 
 Hi Paul, I don't fully understand what you want to do. The way, I think,
 SolrJ is intended to be used is from a client application (outside Solr). If
 what you want is something like what's done with Velocity I think you could
 implement a response writer that renders the JSP and send it on the
 response.
 
 Tomás
 
 
 On Mon, Jan 31, 2011 at 6:25 PM, Paul Libbrecht p...@hoplahup.net wrote:
 
 Tomas,
 
 I also know velocity can be used and works well.
 I would be interested to a simpler way to have the objects of SOLR
 available in a jsp than write a custom jsp processor as a request handler;
 indeed, this seems to be the way solrj is expected to be used in the wiki
 page.
 
 Actually I migrated to velocity (which I like less than jsp) just because I
 did not find a response to this question.
 
 paul
 
 
 Le 31 janv. 2011 à 21:53, Tomás Fernández Löbbe a écrit :
 
 Hi John, you can use whatever you want for building your application,
 using
 Solr on the backend (JSP included). You should find all the information
 you
 need on Solr's wiki page:
 http://wiki.apache.org/solr/
 
 http://wiki.apache.org/solr/including some client libraries to easy
 integrate your application with Solr:
 http://wiki.apache.org/solr/IntegratingSolr
 
 http://wiki.apache.org/solr/IntegratingSolrfor fast prototyping you
 could
 use Velocity:
 http://wiki.apache.org/solr/VelocityResponseWriter
 
 http://wiki.apache.org/solr/VelocityResponseWriterAnyway, I recommend
 you
 to start with Solr's tutorial:
 http://lucene.apache.org/solr/tutorial.html
 
 
 Good luck,
 http://lucene.apache.org/solr/tutorial.htmlTomás
 
 2011/1/31 JOHN JAIRO GÓMEZ LAVERDE jjai...@hotmail.com
 
 
 
 SOLR LUCENE
 DEVELOPERS
 
 Hi i am new to solr and i like to make a custom search page for
 enterprise
 users
 in JSP that takes the results of Apache Solr.
 
 - Where i can find some useful examples for that topic ?
 - Is JSP the correct approach to solve mi requirement ?
 - If not what is the best solution to build a customize search page for
 my
 users?
 
 Thanks
 from South America
 
 JOHN JAIRO GOMEZ LAVERDE
 Bogotá - Colombia

Re: DIH regex remove email + extract url

2011-02-25 Thread Rosa (Anuncios)


Hi Koji,

My question was more about the solr DIH syntax. It doesn't work either 
with the new regex.


Especially the syntax for this:

field column=source xpath=/product/url regex=http:\/\/(.*?)\/(.*) 
/ --- Is it correct? (not the regex, the syntax)?


Example: url=http://www.abcd.com/product.php?id=324 -- i want to index 
source = abcd.com


thanks for your help


Le 25/02/2011 01:43, Koji Sekiguchi a écrit :

Hi Rosa,


field column=description xpath=/product/content
regex=[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Z]{2,4} replaceWith= /


Shouldn't it be regex=[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,4}?

field column=source xpath=/product/url 
regex=http://(.*?)\\/(.*) /


Example: url=http://www.abcd.com/product.php?id=324 -- i want to 
index source = abcd.com


Probably it could be regex=http:\/\/(.*?)\/(.*)

I use a regex web tool:

http://www.regexplanet.com/simple/index.html

Koji

Re: Ramdirectory

2011-02-25 Thread Matt Weber

I have used this without issue.  In the example solrconfig.xml replace
this line:

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

with this one:

directoryFactory name=DirectoryFactory class=solr.RAMDirectoryFactory/

Thanks,
Matt Weber

On Thu, Feb 24, 2011 at 7:47 PM, Bill Bell billnb...@gmail.com wrote:
 Thanks - yeah that is why I asked how to use it. But I still don't know
 how to use it.

 https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/
 RAMDirectoryFactory.html


 https://issues.apache.org/jira/browse/SOLR-465

 directoryProvider class=org.apache.lucene.store.RAMDirectory
 !-- Parameters as required by the implementation --
 /directoryProvider


 Is that right? Examples? Options?

 Where do I put that in solrconfig.xml ? Do I put it in
 mainIndex/directoryProvider ?

 I know that SOLR-465 is more generic, but
 https://issues.apache.org/jira/browse/SOLR-480 seems easier to use.



 Thanks.


 On 2/24/11 6:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote:


: I could not figure out how to setup the ramdirectory option in
solrconfig.XML. Does anyone have an example for 1.4?

it wasn't an option in 1.4.

as Koji had already mentioned in the other thread where you chimed in
and asked about this, it was added in the 3x branch...

http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td25671
66.html



-Hoss






-- 
Thanks,
Matt Weber

upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread jo


I have tried the steps indicated here:
http://wiki.apache.org/solr/ExtractingRequestHandler
http://wiki.apache.org/solr/ExtractingRequestHandler 

and when I try to parse a document nothing would happen, no error.. I have
copied the jar files everywhere, and nothing.. can anyone give me the steps
on how to upgrade just tika, btw, currently on 1.4.1 has tika 0.4

thank you


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p2570526.html
Sent from the Solr - User mailing list archive at Nabble.com.

LetterTokenizer + EdgeNGram + apostrophe in query = invalid result

2011-02-25 Thread Matt Weber

I have the following field defined in my schema:

  fieldType name=ngram class=solr.TextField positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.LetterTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 /
/analyzer
analyzer type=query
  tokenizer class=solr.LetterTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
  /fieldType

  field name=person type=ngram indexed=true stored=true /

I have the default field set to person and have indexed the
following document:

add
   doc
   field name=id![CDATA[1001116609]]/field
   field name=person![CDATA[Vincent M D'Onofrio]]/field
   /doc
/add


The following queries return the result as expected using the standard
request handler:

vincent m d onofrio
d'o
onofrio
d onofrio

The following query fails:

d'onofrio

This is weird because d'o returns a result.  As soon as I type the
n I start to get no results.  I ran this though the field analysis
page and  it shows that this query is being tokenized correctly and
should yield a result.

I am using a build of trunk Solr (r1073990) and the example
solrconfig.xml.  I am also using the example schema with the addition
of my ngram field.

Any ideas?  I have tried this with other word's containing an
apostrophe and they all stop returning results after 4 characters.


Thanks,
Matt Weber

Partial search extremly slow

2011-02-25 Thread javaxmlsoapdev


Since my users wanted to have a partial search functionality I had to
introduce following. I have declared two EdgeNGram filters with both side
back and front since they wanted to have partial search working from any
side. 

fieldType name=edgytext class=solr.TextField
 analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.EdgeNGramFilterFactory
minGramSize=1maxGramSize=25 side=back/
  filter class=solr.EdgeNGramFilterFactory
minGramSize=1maxGramSize=25 side=front/
 /analyzer
/fieldType

When executing search (which brings back 4K plus records from the index)
response time extremely slow. 

The two db columns which I index and search against are huge and where one
of the db columns is of type CLOB. This is to give you an idea that this db
column of type CLOB is being indexed with edgyText and also searched upon.
From documentation I understand partial search behaves slow due to gram
nature. what's the best way to implement this functionality and still get
good response time?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-search-extremly-slow-tp2572861p2572861.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr running on many sites

2011-02-25 Thread Stefan Matheis

Hi Grant,

Multi Sites == Multi Cores? :) http://wiki.apache.org/solr/MultiCore have a look

Regards
Stefan

On Fri, Feb 25, 2011 at 3:15 AM, Grant Longhurst
grant.longhu...@ecorner.com.au wrote:
 Hi,



 We are a e-commerce service provider and are looking at using solr for
 all the site searches. Was just wondering what the best way is to set it
 up for many sites instead of just one?



 Thanks.



 Regards,

 Grant Longhurst

 Technical Consultant

 eCorner Pty Ltd



 Phone: +61 2 9494 0200

 Email: grant.longhu...@ecorner.com.au
 mailto:grant.longhu...@ecorner.com.au



 Web - www.ecorner.com.au

 Buy a store - www.ecornerstoresplus.com.au
 http://www.ecornerstoresplus.com.au/

 Buy email security - www.cloudmark.com.au

 Need Support -  www.ecorner.com.au/support
 http://www.ecorner.com.au/support

 Need Help! -  help.ecorner.com http://help.ecorner.com/

Re: Make syntax highlighter caseinsensitive

2011-02-25 Thread Tarjei Huse

Hi,
On 02/25/2011 02:06 AM, Koji Sekiguchi wrote:
 (11/02/24 20:18), Tarjei Huse wrote:
 Hi,

 I got an index where I have two fields, body and caseInsensitiveBody.
 Body is indexed and stored while caseInsensitiveBody is just indexed.

 The idea is that by not storing the caseInsensitiveBody I save some
 space and gain some performance. So I query against the
 caseInsensitiveBody and generate highlighting from the case sensitive
 one.

 The problem is that as a result, I am missing highlighting terms. For
 example, when I search for solr and get a match in caseInsensitiveBody
 for solr but that it is Solr in the original document, no highlighting
 is done.

 Is there a way around this? Currently I am using the following
 highlighting params:
  'hl' =  'on',
  'hl.fl' =  'header,body',
  'hl.usePhraseHighlighter' =  'true',
  'hl.highlightMultiTerm' =  'true',
  'hl.fragsize' =  200,
  'hl.regex.pattern' =  '[-\w ,/\n\\']{20,200}',

 Tarjei,

 Maybe silly question, but why no you make body field case insensitive
 and eliminate caseInsensitiveBody field, and then query and highlight on
 just body field?
Not silly. I need to support usage scenarios where case matters as well
as scenarios where case doesn't matter.

The best part would be if I could use one field for this, store it and
handle case sensitivity in the query phase, but as I understand it, that
is not possible.

Regards,
Tarjei

 Koji


-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413

Re: Tomcat EXE Source Code

2011-02-25 Thread Jan Høydahl

Why do you want it?
Try asking on the Tomcat list :)
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 25. feb. 2011, at 09.16, rajini maski wrote:

  Can anybody help me to get the source code of the Tomcat exe
 file i.e, source code of the installation exe .
 
 Thanks..

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Jan Høydahl

Your best bet is perhaps upgrading to latest 1.4 branch, i.e. 1.4.2-dev 
(http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/)
It includes Tika 0.8-SNAPSHOT and is a compatible drop-in (war/jar) replacement 
with lots of other bug fixes you'd also like (check changes.txt).

svn co http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
cd branch-1.4
ant dist

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 24. feb. 2011, at 21.42, jo wrote:

 
 I have tried the steps indicated here:
 http://wiki.apache.org/solr/ExtractingRequestHandler
 http://wiki.apache.org/solr/ExtractingRequestHandler 
 
 and when I try to parse a document nothing would happen, no error.. I have
 copied the jar files everywhere, and nothing.. can anyone give me the steps
 on how to upgrade just tika, btw, currently on 1.4.1 has tika 0.4
 
 thank you
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p2570526.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tomcat EXE Source Code

2011-02-25 Thread rajini maski

I am trying to configure tomcat multi instances with that many number of
services configured too. Right now that particular tomcat exe let create
only one. If the same exe run again and tried to configure at other
destination folder ,It throws an exception as service already exists.How can
I fix this problem.. Any suggestions?


On Fri, Feb 25, 2011 at 3:18 PM, Jan Høydahl jan@cominvent.com wrote:

 Why do you want it?
 Try asking on the Tomcat list :)
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 25. feb. 2011, at 09.16, rajini maski wrote:

   Can anybody help me to get the source code of the Tomcat exe
  file i.e, source code of the installation exe .
 
  Thanks..

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Markus Jelsma

You don't want to use 0.8 if you're parsing PDF.

 Your best bet is perhaps upgrading to latest 1.4 branch, i.e. 1.4.2-dev
 (http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/) It
 includes Tika 0.8-SNAPSHOT and is a compatible drop-in (war/jar)
 replacement with lots of other bug fixes you'd also like (check
 changes.txt).
 
 svn co http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
 cd branch-1.4
 ant dist
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 On 24. feb. 2011, at 21.42, jo wrote:
  I have tried the steps indicated here:
  http://wiki.apache.org/solr/ExtractingRequestHandler
  http://wiki.apache.org/solr/ExtractingRequestHandler
  
  and when I try to parse a document nothing would happen, no error.. I
  have copied the jar files everywhere, and nothing.. can anyone give me
  the steps on how to upgrade just tika, btw, currently on 1.4.1 has tika
  0.4
  
  thank you

Re: DIH regex remove email + extract url

2011-02-25 Thread Koji Sekiguchi


Hi Rosa,

Are you sure you have transformer=RegexTransformer in your entity/?


My question was more about the solr DIH syntax. It doesn't work either with the 
new regex.

Especially the syntax for this:

field column=source xpath=/product/url regex=http:\/\/(.*?)\/(.*) / 
--- Is it correct? (not
the regex, the syntax)?


In this case, I think you need to have two field names in groupNames,
because you have two groups (.*?)  (.*) in your regex.
But I'm not confident. Please try if you'd like...

Koji
--
http://www.rondhuit.com/en/

Re: CUSTOM JSP FOR APACHE SOLR

2011-02-25 Thread Erik Hatcher


On Feb 1, 2011, at 08:58 , Estrada Groups wrote:

 Has anyone noticed the rails application that installs with Solr4.0? I am 
 interested to hear some feedback on that one...

I guess you're talking about the client/ruby/flare stuff?   It's been untouched 
for quite a while and has not been upgraded to Rails3.  It still works, and has 
a lot of (in my biased opinion! :) slick features that folks can borrow from.  
At one point Koji had a public demo site using it that I thought was way cool.  
Flare was used as a basis for the initial version of Blacklight 
http://projectblacklight.org/.  Blacklight has since evolved dramatically 
into a very full featured (and mostly general purpose) way cool front-end to 
Solr.

I still sometimes fire up Flare for demonstration purposes (saved searches that 
become facet.query's, Simile Timeline integration, and pie charts demo nicely).

At this point, Flare is as-is... try it out if you are Ruby/Rails savvy.

Erik

Re: Tomcat EXE Source Code

2011-02-25 Thread Gora Mohanty

On Fri, Feb 25, 2011 at 3:42 PM, rajini maski rajinima...@gmail.com wrote:
 I am trying to configure tomcat multi instances with that many number of
 services configured too. Right now that particular tomcat exe let create
 only one. If the same exe run again and tried to configure at other
 destination folder ,It throws an exception as service already exists.How can
 I fix this problem.. Any suggestions?
[...]

This question properly belongs on a tomcat list, but if I understand your need
correctly, you could try the results returned by searching Google for
tomcat multiple instances. There is no need to modify the Tomcat source
code.

Regards,
Gora

Re: problem when search grouping word

2011-02-25 Thread Chamnap Chhorn

Any idea?

On Thu, Feb 24, 2011 at 6:49 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote:

 There are many product names. How could I list them all, and the list is
 growing fast as well?


 On Thu, Feb 24, 2011 at 5:25 PM, Grijesh pintu.grij...@gmail.com wrote:


 may synonym will help

 -
 Thanx:
 Grijesh
 http://lucidimagination.com
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566550.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/

solr score issue

2011-02-25 Thread Bagesh Sharma


Hi sir , 

Can anyone explain me how this score is being calculated. i am searching
here software engineer using dismax handler. Total documents indexed are
477 and query results are 28.

Query is like that -
   q=software+engineerfq=location%3Adelhi

dismax setting is - 

   str name=qf
 alltext
 title^2
 functional_role^1
/str

str name=pf
  body^100
/str


Here alltext field is made by copying all fields.
body field contains detail of job.

I am unable to understand how these scores have been calculated. From where
to start score calculating and what are default score for any term matching.

str name=20080604/3eb9a7b30131a782a0c0a0e2cdb2b6b8.html

0.5901718 = (MATCH) sum of:
  0.0032821721 = (MATCH) sum of:
0.0026574256 = (MATCH) max plus 0.1 times others of:
  0.0026574256 = (MATCH) weight(alltext:softwar in 339), product of:
0.0067262817 = queryWeight(alltext:softwar), product of:
  3.6121683 = idf(docFreq=34, maxDocs=477)
  0.0018621174 = queryNorm
0.39508092 = (MATCH) fieldWeight(alltext:softwar in 339), product
of:
  1.0 = tf(termFreq(alltext:softwar)=1)
  3.6121683 = idf(docFreq=34, maxDocs=477)
  0.109375 = fieldNorm(field=alltext, doc=339)
6.2474643E-4 = (MATCH) max plus 0.1 times others of:
  6.2474643E-4 = (MATCH) weight(alltext:engin in 339), product of:
0.0032613424 = queryWeight(alltext:engin), product of:
  1.7514161 = idf(docFreq=224, maxDocs=477)
  0.0018621174 = queryNorm
0.19156113 = (MATCH) fieldWeight(alltext:engin in 339), product of:
  1.0 = tf(termFreq(alltext:engin)=1)
  1.7514161 = idf(docFreq=224, maxDocs=477)
  0.109375 = fieldNorm(field=alltext, doc=339)
  0.5868896 = weight(body:softwar engin^100.0 in 339), product of:
0.9995919 = queryWeight(body:softwar engin^100.0), product of:
  100.0 = boost
  5.3680387 = idf(body: softwar=34 engin=223)
  0.0018621174 = queryNorm
0.58712924 = fieldWeight(body:softwar engin in 339), product of:
  1.0 = tf(phraseFreq=1.0)
  5.3680387 = idf(body: softwar=34 engin=223)
  0.109375 = fieldNorm(field=body, doc=339)
/str


please suggest me.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-score-issue-tp2574680p2574680.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr admin result page error

2011-02-25 Thread Bernd Fehling

Hi Markus,

the result of my investigation is that Lucene currently can only handle
UTF-8 code within BMP [Basic Multilingual Plane] (plane 0) = 0x.

Any code above BMP might end in unpredictable results which is bad.
If you get invalid UTF-8 from the index and use wt=xml it gives the error
page. This is due to encoding=text/xml and charset=utf-8 in the header.
If you use wt=json then the encoding is text/plain and charset=utf-8.
Because of text/plain you don't get an error page but nevertheless the
content is invalid. I guess it replaces all invalid code with UTF-8 BOM.
So currently no solution, even not with JSON.

This should (hopefully) be fixed with Lucene 3.1.

Regards,
Bernd


Am 11.02.2011 15:50, schrieb Markus Jelsma:
 No i haven't located the issue. It might be Solr but it could also be Xerces 
 having trouble with it. You can possibly work around the problem by using the 
 JSONResponseWriter.
 
 On Friday 11 February 2011 15:45:23 Bernd Fehling wrote:
 Hi Markus,

 yes it looks like the same issue. There is also a \u utf8-code in your
 dump. Till now I followed it into XMLResponseWriter.
 Some steps before the result in a buffer looks good and the utf8-code is
 correct. Really hard to debug this freaky problem.

 Have you looked deeper into this and located the bug?

 It is definately a bug and has nothing to do with firefox.

 Regards,
 Bernd

 Am 11.02.2011 13:48, schrieb Markus Jelsma:
 It looks like you hit the same issue as i did a while ago:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg46510.html

 On Friday 11 February 2011 08:59:27 Bernd Fehling wrote:
 Dear list,

 after loading some documents via DIH which also include urls
 I get this yellow XML error page as search result from solr admin GUI
 after a search.
 It says XML processing error not well-formed.
 The code it argues about is:

 arr name=dcurls
 strhttp://eprints.soton.ac.uk/43350//str
 strhttp://dx.doi.org/doi:10.1112/S0024610706023143/str
 strMartinez-Perez, Conchita and Nucinkis, Brita E.A. (2006)
 Cohomological dimension of Mackey functors for infinite groups. Journal
 of the London Mathematical Society, 74, (2), 379-396.
 (doi:10.1112/S0024610706023143
 lt;http://dx.doi.org/10.1112/S002461070602314\ugt;)/str/arr

 See the \u utf8-code in the last line.

 1. the loaded data is valid, well-formed and checked with xmllint. No
 errors. 2. there is no \u utf8-code in the source data.
 3. the data is loaded via DIH without any errors.
 4. if opening the source-view of the result page with firefox there is
 also no \u utf8-code.

 Only idea I have is solr itself or the result page generation.

 How to proceed, what else to check?

 Regards,
 Bernd
 

-- 
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*

Re: Tomcat EXE Source Code

2011-02-25 Thread Adam Estrada

Some of these links may help...

http://www.google.com/search?client=safarirls=enq=apache+tomcat+downloadie=UTF-8oe=UTF-8

Adam


On Feb 25, 2011, at 3:16 AM, rajini maski wrote:

  Can anybody help me to get the source code of the Tomcat exe
 file i.e, source code of the installation exe .
 
 Thanks..

Re: Make syntax highlighter caseinsensitive

2011-02-25 Thread Koji Sekiguchi


(11/02/25 18:30), Tarjei Huse wrote:

Hi,
On 02/25/2011 02:06 AM, Koji Sekiguchi wrote:

(11/02/24 20:18), Tarjei Huse wrote:

Hi,

I got an index where I have two fields, body and caseInsensitiveBody.
Body is indexed and stored while caseInsensitiveBody is just indexed.

The idea is that by not storing the caseInsensitiveBody I save some
space and gain some performance. So I query against the
caseInsensitiveBody and generate highlighting from the case sensitive
one.

The problem is that as a result, I am missing highlighting terms. For
example, when I search for solr and get a match in caseInsensitiveBody
for solr but that it is Solr in the original document, no highlighting
is done.

Is there a way around this? Currently I am using the following
highlighting params:
  'hl' =   'on',
  'hl.fl' =   'header,body',
  'hl.usePhraseHighlighter' =   'true',
  'hl.highlightMultiTerm' =   'true',
  'hl.fragsize' =   200,
  'hl.regex.pattern' =   '[-\w ,/\n\\']{20,200}',


Tarjei,

Maybe silly question, but why no you make body field case insensitive
and eliminate caseInsensitiveBody field, and then query and highlight on
just body field?

Not silly. I need to support usage scenarios where case matters as well
as scenarios where case doesn't matter.

The best part would be if I could use one field for this, store it and
handle case sensitivity in the query phase, but as I understand it, that
is not possible.


Hi Tarjei,

If I understand it correctly, you want to highlight case insensitive way.
If so, it is easy. You have:

body: indexed but not stored
caseInsensitiveBody: indexed and stored

and request hl.fl=caseInsensitiveBody ?

Koji
--
http://www.rondhuit.com/en/

Re: Tomcat EXE Source Code

2011-02-25 Thread Jan Høydahl

 I am trying to configure tomcat multi instances with that many number of
 services configured too. Right now that particular tomcat exe let create
 only one. If the same exe run again and tried to configure at other
 destination folder ,It throws an exception as service already exists.How can
 I fix this problem.. Any suggestions?

http://stackoverflow.com/questions/179/use-multiple-catalina-base-to-setup-tomcat-6-instances-on-windows

Re: solr score issue

2011-02-25 Thread Jayendra Patil

Check the Need help in understanding output of searcher.explain()
function thread.

http://mail-archives.apache.org/mod_mbox/lucene-java-user/201008.mbox/%3CAANLkTi=m9a1guhrahpeyqaxhu9gta9fjbnr7-8-zi...@mail.gmail.com%3E

Regards,
Jayendra

On Fri, Feb 25, 2011 at 6:57 AM, Bagesh Sharma mail.bag...@gmail.com wrote:

 Hi sir ,

 Can anyone explain me how this score is being calculated. i am searching
 here software engineer using dismax handler. Total documents indexed are
 477 and query results are 28.

 Query is like that -
       q=software+engineerfq=location%3Adelhi

 dismax setting is -

       str name=qf
             alltext
             title^2
             functional_role^1
        /str

        str name=pf
              body^100
        /str


 Here alltext field is made by copying all fields.
 body field contains detail of job.

 I am unable to understand how these scores have been calculated. From where
 to start score calculating and what are default score for any term matching.

 str name=20080604/3eb9a7b30131a782a0c0a0e2cdb2b6b8.html

 0.5901718 = (MATCH) sum of:
  0.0032821721 = (MATCH) sum of:
    0.0026574256 = (MATCH) max plus 0.1 times others of:
      0.0026574256 = (MATCH) weight(alltext:softwar in 339), product of:
        0.0067262817 = queryWeight(alltext:softwar), product of:
          3.6121683 = idf(docFreq=34, maxDocs=477)
          0.0018621174 = queryNorm
        0.39508092 = (MATCH) fieldWeight(alltext:softwar in 339), product
 of:
          1.0 = tf(termFreq(alltext:softwar)=1)
          3.6121683 = idf(docFreq=34, maxDocs=477)
          0.109375 = fieldNorm(field=alltext, doc=339)
    6.2474643E-4 = (MATCH) max plus 0.1 times others of:
      6.2474643E-4 = (MATCH) weight(alltext:engin in 339), product of:
        0.0032613424 = queryWeight(alltext:engin), product of:
          1.7514161 = idf(docFreq=224, maxDocs=477)
          0.0018621174 = queryNorm
        0.19156113 = (MATCH) fieldWeight(alltext:engin in 339), product of:
          1.0 = tf(termFreq(alltext:engin)=1)
          1.7514161 = idf(docFreq=224, maxDocs=477)
          0.109375 = fieldNorm(field=alltext, doc=339)
  0.5868896 = weight(body:softwar engin^100.0 in 339), product of:
    0.9995919 = queryWeight(body:softwar engin^100.0), product of:
      100.0 = boost
      5.3680387 = idf(body: softwar=34 engin=223)
      0.0018621174 = queryNorm
    0.58712924 = fieldWeight(body:softwar engin in 339), product of:
      1.0 = tf(phraseFreq=1.0)
      5.3680387 = idf(body: softwar=34 engin=223)
      0.109375 = fieldNorm(field=body, doc=339)
 /str


 please suggest me.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-score-issue-tp2574680p2574680.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Question on writing custom UpdateHandler

2011-02-25 Thread Mark

I am trying to write my own custom UpdateHandler that extends 
DirectUpdateHandler2.


I would like to be able to query the current state of the index within 
the addDoc method. How would I be able to accomplish this?


I tried something like the following but it was a big fat fail as it 
quickly created an enormous amount of indexes files and I received a 
too many open files exception.


/iwCommit.lock();
try {
openWriter();
docs = new IndexSearcher(writer.getReader()).search(query, MAX_DOCS);
} finally {
iwCommit.unlock();
}/

I'm guessing the call to new IndexSearcher is at fault but I'm unsure of 
a way around this.


Thanks for your help!

Re: Question on writing custom UpdateHandler

2011-02-25 Thread Mark

Or how can I perform a query on the current state of the index from 
within an UpdateProcessor?


Thanks

On 2/25/11 6:30 AM, Mark wrote:
I am trying to write my own custom UpdateHandler that extends 
DirectUpdateHandler2.


I would like to be able to query the current state of the index within 
the addDoc method. How would I be able to accomplish this?


I tried something like the following but it was a big fat fail as it 
quickly created an enormous amount of indexes files and I received a 
too many open files exception.


/iwCommit.lock();
try {
openWriter();
docs = new IndexSearcher(writer.getReader()).search(query, MAX_DOCS);
} finally {
iwCommit.unlock();
}/

I'm guessing the call to new IndexSearcher is at fault but I'm unsure 
of a way around this.


Thanks for your help!

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Mattmann, Chris A (388J)

Hi Jo,

You may consider checking out Tika trunk, where we recently have a Tika JAX-RS 
web service [1] committed as part of the tika-server module. You could probably 
wire DIH into it and accomplish the same thing.

Cheers,
Chris

[1] https://issues.apache.org/jira/browse/TIKA-593

On Feb 24, 2011, at 12:42 PM, jo wrote:

 
 I have tried the steps indicated here:
 http://wiki.apache.org/solr/ExtractingRequestHandler
 http://wiki.apache.org/solr/ExtractingRequestHandler 
 
 and when I try to parse a document nothing would happen, no error.. I have
 copied the jar files everywhere, and nothing.. can anyone give me the steps
 on how to upgrade just tika, btw, currently on 1.4.1 has tika 0.4
 
 thank you
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p2570526.html
 Sent from the Solr - User mailing list archive at Nabble.com.


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++

manually editing spellcheck dictionary

2011-02-25 Thread Tanner Postert

I'm using an index based spellcheck dictionary and I was wondering if there
were a way for me to manually remove certain words from the dictionary.

Some of my content has some mis-spellings, and for example when I search for
the word sherrif (which should be spelled sheriff), it get recommendations
like sherriff or sherri instead. If I could remove those words, it would
seem like the system would work a little better.

Re: manually editing spellcheck dictionary

2011-02-25 Thread Sujit Pal

If the dictionary is a Lucene index, wouldn't it be as simple as delete
using a term query? Something like this:

IndexReader sdreader = new IndexReader();
sdreader.delete(new Term(word, sherri));
...
sdreader.optimize();
sdreader.close();

I am guessing your dictionary is built dynamically using content words.
If so, you may want to run the words through an aspell like filter
(jazzy.sf.net is a Java implementation of aspell that works quite well
with single words) to determine if more of these should be removed, and
whether they should be added in the first place.

-sujit

On Fri, 2011-02-25 at 10:41 -0700, Tanner Postert wrote:
 I'm using an index based spellcheck dictionary and I was wondering if there
 were a way for me to manually remove certain words from the dictionary.
 
 Some of my content has some mis-spellings, and for example when I search for
 the word sherrif (which should be spelled sheriff), it get recommendations
 like sherriff or sherri instead. If I could remove those words, it would
 seem like the system would work a little better.

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread jo


You guys are great.. I will stick for now to the release version and if I
have problem parsing I will give the branch jars a try the reason I am
looking for upgrading tika is because tika keeps improving on things like
languages, mime type support, and so on 

thanks again

JO
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-tp2570526p2576658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Darx Oman

hi
if you want to index pdf files then use tika 0.6
because 0.7 and 0.8 does not detect the correctly the pdfParse

Re: Omitting tf but not positions

2011-02-25 Thread Jan Høydahl

I also have a case (yellow-page) where IDF comes in and destroys the rank.
A company listing with a word which occurs in few other listings is not 
necessarily better than others just because of that. When it gets to the 
extreme value of IDF=1, we get an artificially high IDF boost.

It is not killed by omitNorms, neither by omitTermFrequencyAndPositions. Any 
per-field way to get rid of the IDF effect?
Or should I override idf() in Similarity?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 15. des. 2010, at 13.27, Robert Muir wrote:

 On Wed, Dec 15, 2010 at 3:09 AM, Jan Høydahl / Cominvent
 jan@cominvent.com wrote:
 Any way to disable TF/IDF normalization without also disabling positions?
 
 
 see Similarity.tf(float) and Similarity.tf(int)
 
 if you want to change this for both terms and phrases just override
 Similarity.tf(float), since by default Similarity.tf(int) delegates to
 that.
 otherwise, override both.
 
 of course the big limitation being you cant customize Similarity per-field 
 yet.

Re: Omitting tf but not positions

2011-02-25 Thread Robert Zotter


Jan,

You are correct, you'll need your own Similarity class.

Have a look at SweetSpotSimilarity 
(http://lucene.apache.org/java/3_0_3/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html)


On 2/25/11 10:57 AM, Jan Høydahl wrote:

I also have a case (yellow-page) where IDF comes in and destroys the rank.
A company listing with a word which occurs in few other listings is not 
necessarily better than others just because of that. When it gets to the 
extreme value of IDF=1, we get an artificially high IDF boost.

It is not killed by omitNorms, neither by omitTermFrequencyAndPositions. Any 
per-field way to get rid of the IDF effect?
Or should I override idf() in Similarity?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 15. des. 2010, at 13.27, Robert Muir wrote:


On Wed, Dec 15, 2010 at 3:09 AM, Jan Høydahl / Cominvent
jan@cominvent.com  wrote:

Any way to disable TF/IDF normalization without also disabling positions?


see Similarity.tf(float) and Similarity.tf(int)

if you want to change this for both terms and phrases just override
Similarity.tf(float), since by default Similarity.tf(int) delegates to
that.
otherwise, override both.

of course the big limitation being you cant customize Similarity per-field yet.

Re: Omitting tf but not positions

2011-02-25 Thread Robert Muir

On Fri, Feb 25, 2011 at 1:57 PM, Jan Høydahl jan@cominvent.com wrote:
 I also have a case (yellow-page) where IDF comes in and destroys the rank.
 A company listing with a word which occurs in few other listings is not 
 necessarily better than others just because of that. When it gets to the 
 extreme value of IDF=1, we get an artificially high IDF boost.

 It is not killed by omitNorms, neither by omitTermFrequencyAndPositions. Any 
 per-field way to get rid of the IDF effect?
 Or should I override idf() in Similarity?


Hi Jan, my reply was back in december. These days in lucene/solr
trunk, you can customize Similarity on a per-field basis.
So your yellow-page field can have a completely different similarity
(tf, idf, lengthnorm, etc).

For that field you can disable things like TF and IDF entirely, e.g.
just set it to a constant such as 1 or if you think thats too risky,
consider an alternative ranking scheme that doesn't use the IDF at all
such as the example in
https://issues.apache.org/jira/browse/LUCENE-2864

For now, you have to implement SimilarityProvider in a java class
(with something like a hashmap returning different similaritys for
different fields), and set this up with the similarity hook in
schema.xml, but there is an issue open to make this easier:
https://issues.apache.org/jira/browse/SOLR-2338

Re: boosting based on number of terms matched?

2011-02-25 Thread Chris Hostetter


: I'm using the edismax handler, although my question is probably the same for
: dismax. When the user types a long query, I use the mm parameter so that
: only 75% of terms need to match. This works fine, however, sometimes documents
: that only match 75% of the terms show up higher in my results than documents
: that match 100%. I'd like to set a boost so that documents that match 100%
: will be much more likely to be put ahead of documents that only match 75%. Can
: anyone give me a pointer of how to do this? Thanks,

this is essentially the default behavior -- mm just sets a minimum 
number of clauses to be considered a match, but the coord factor still 
applies and penalizes docs based on how many clauses they don't match.

if you are seeing docs that match fewer terms score higher then docs 
matching more terms it is likely because of the boosts you already have 
specified (either in the qf, or maybe using the bf), but the discrepency 
could be based on other standard scoring factors as well (lengthNorm, 
index time doc boosts, the IDF of the terms, etc...

this is where it beocmes neccessary to start looking at score explanations 
and really thinking through the data.


-Hoss

Tika metadata extracted per supported document format?

2011-02-25 Thread Andreas Kemkes

Hello,

I've asked this on the Tika mailing list w/o an answer, so apologies for 
cross-posting.

I'm trying to find information that tells me specifically what metadata is 
provided for the different supported document formats.  Unfortunately all I was 
able to find so far is The Metadata produced depends on the type of document 
submitted.

Currently, I'm using ExtractingRequestHandler from Solr 1.4 (with Tika 0.4), so 
I'm particularly interested in that version, but also in changes that are 
provided in newer versions of Tika.

Where are the best places to look for such information?

Thanks in advance,

Andreas

Re: DIH regex remove email + extract url

2011-02-25 Thread Rosa (Anuncios)


Hi Koji,

Yes of course i have RegexTransformer in my entity/.

What i'm not sure is the syntax of this field column=source 
xpath=/product/url regex= / i don't need any other parameter here?


Rosa

Le 25/02/2011 12:21, Koji Sekiguchi a écrit :

Hi Rosa,

Are you sure you have transformer=RegexTransformer in your entity/?

My question was more about the solr DIH syntax. It doesn't work 
either with the new regex.


Especially the syntax for this:

field column=source xpath=/product/url 
regex=http:\/\/(.*?)\/(.*) / --- Is it correct? (not

the regex, the syntax)?


In this case, I think you need to have two field names in groupNames,
because you have two groups (.*?)  (.*) in your regex.
But I'm not confident. Please try if you'd like...

Koji

Re: Tika metadata extracted per supported document format?

2011-02-25 Thread Mattmann, Chris A (388J)

Hi Andreas,

In Tika 0.8+, you can run the --list-met-models command from tika-app:

java -jar tika-app-version.jar --list-met-models

And get a print out of the met keys that Tika supports. Some parsers add their 
own that aren't part of this met listing, but this is a relatively 
comprehensive list.

Cheers,
Chris

On Feb 25, 2011, at 12:10 PM, Andreas Kemkes wrote:

 Hello,
 
 I've asked this on the Tika mailing list w/o an answer, so apologies for 
 cross-posting.
 
 I'm trying to find information that tells me specifically what metadata is 
 provided for the different supported document formats.  Unfortunately all I 
 was 
 able to find so far is The Metadata produced depends on the type of document 
 submitted.
 
 Currently, I'm using ExtractingRequestHandler from Solr 1.4 (with Tika 0.4), 
 so 
 I'm particularly interested in that version, but also in changes that are 
 provided in newer versions of Tika.
 
 Where are the best places to look for such information?
 
 Thanks in advance,
 
 Andreas
 
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++

Re: Tika metadata extracted per supported document format?

2011-02-25 Thread Andreas Kemkes

Hi Chris,

Thank you so much - that's a great start.

Andreas




From: Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Cc: u...@tika.apache.org u...@tika.apache.org
Sent: Fri, February 25, 2011 1:21:33 PM
Subject: Re: Tika metadata extracted per supported document format?

Hi Andreas,

In Tika 0.8+, you can run the --list-met-models command from tika-app:

java -jar tika-app-version.jar --list-met-models

And get a print out of the met keys that Tika supports. Some parsers add their 
own that aren't part of this met listing, but this is a relatively 
comprehensive 
list.

Cheers,
Chris

On Feb 25, 2011, at 12:10 PM, Andreas Kemkes wrote:

 Hello,
 
 I've asked this on the Tika mailing list w/o an answer, so apologies for 
 cross-posting.
 
 I'm trying to find information that tells me specifically what metadata is 
 provided for the different supported document formats.  Unfortunately all I 
 was 

 able to find so far is The Metadata produced depends on the type of document 
 submitted.
 
 Currently, I'm using ExtractingRequestHandler from Solr 1.4 (with Tika 0.4), 
 so 

 I'm particularly interested in that version, but also in changes that are 
 provided in newer versions of Tika.
 
 Where are the best places to look for such information?
 
 Thanks in advance,
 
 Andreas
 
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Andreas Kemkes

According to the Tika release notes, it's fixed in 0.9.  Haven't tried it 
myself.

A critical backwards incompatible bug in PDF parsing that was introduced in 
Tika 
0.8 has been fixed. (TIKA-548)

Andreas




From: Darx Oman darxo...@gmail.com
To: solr-user@lucene.apache.org
Sent: Fri, February 25, 2011 10:33:39 AM
Subject: Re: upgrading to Tika 0.9 on Solr 1.4.1

hi
if you want to index pdf files then use tika 0.6
because 0.7 and 0.8 does not detect the correctly the pdfParse

Case insensitive but number sensitive string?

2011-02-25 Thread Jon Drukman

I want a string field that is case insensitive.  This is what I tried:

 fieldType name=cistring class=solr.StrField sortMissingLast=true
omitNorms=true
analyzer type=index
tokenizer class=solr.LowerCaseTokenizerFactory/
/analyzer
analyzer type=query
tokenizer class=solr.LowerCaseTokenizerFactory/
/analyzer
/fieldType


However, it is matching opengl for opengl128.  I want exact string matches,
but I want them case-insensitive.  What did I do wrong?

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Mattmann, Chris A (388J)

Yep it's fixed in 0.9.

Cheers,
Chris

On Feb 25, 2011, at 2:37 PM, Andreas Kemkes wrote:

 According to the Tika release notes, it's fixed in 0.9.  Haven't tried it 
 myself.
 
 A critical backwards incompatible bug in PDF parsing that was introduced in 
 Tika 
 0.8 has been fixed. (TIKA-548)
 
 Andreas
 
 
 
 
 From: Darx Oman darxo...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, February 25, 2011 10:33:39 AM
 Subject: Re: upgrading to Tika 0.9 on Solr 1.4.1
 
 hi
 if you want to index pdf files then use tika 0.6
 because 0.7 and 0.8 does not detect the correctly the pdfParse
 
 
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++

Re: Case insensitive but number sensitive string?

2011-02-25 Thread Ahmet Arslan

 I want a string field that is case
 insensitive.  This is what I tried:
 
  fieldType name=cistring class=solr.StrField
 sortMissingLast=true
 omitNorms=true
         analyzer type=index
                
 tokenizer class=solr.LowerCaseTokenizerFactory/
         /analyzer
         analyzer type=query
                
 tokenizer class=solr.LowerCaseTokenizerFactory/
         /analyzer
     /fieldType
 
 
 However, it is matching opengl for opengl128.  I
 want exact string matches,
 but I want them case-insensitive.  What did I do
 wrong?
 

class=solr.StrField should be class=solr.TextField

Re: Case insensitive but number sensitive string?

2011-02-25 Thread Jon Drukman

Ahmet Arslan iorixxx at yahoo.com writes:

 
  I want a string field that is case
  insensitive.  This is what I tried:
  
   fieldType name=cistring class=solr.StrField
  sortMissingLast=true
  omitNorms=true
          analyzer type=index
                 
  tokenizer class=solr.LowerCaseTokenizerFactory/
          /analyzer
          analyzer type=query
                 
  tokenizer class=solr.LowerCaseTokenizerFactory/
          /analyzer
      /fieldType
  
  
  However, it is matching opengl for opengl128.  I
  want exact string matches,
  but I want them case-insensitive.  What did I do
  wrong?
  
 
 class=solr.StrField should be class=solr.TextField 
 
 

This is what I ended up with. Seems to work:

 fieldType name=cistring class=solr.TextField sortMissingLast=true
omitNorms=true
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

Re: DIH regex remove email + extract url

2011-02-25 Thread Koji Sekiguchi


(11/02/26 5:24), Rosa (Anuncios) wrote:

Hi Koji,

Yes of course i have RegexTransformer in my entity/.

What i'm not sure is the syntax of this field column=source xpath=/product/url 
regex= / i
don't need any other parameter here?


Hi Rosa,

So I've mentioned groupNames attribute for field element in previous mail.
Did you try it?

Koji
--
http://www.rondhuit.com/en/

Re: Tika metadata extracted per supported document format?

2011-02-25 Thread Andreas Kemkes

Hi Chris,

java -jar tika-app-0.9.jar --list-met-models
TikaMetadataKeys
 PROTECTED
 RESOURCE_NAME_KEY
TikaMimeKeys
 MIME_TYPE_MAGIC
 TIKA_MIME_FILE

Both 0.8 and 0.9 give me the same list.  Is that a configuration issue?

I'm a bit unclear if that gets me to what I was looking for - metadata 
like content_type or last_modified.  Or am I confusing Tika metadata 
with SolrCell metadata?

I thought SolrCell metadata comes from Tika, or does it not?

Regards,

Andreas




From: Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Cc: u...@tika.apache.org u...@tika.apache.org
Sent: Fri, February 25, 2011 1:21:33 PM
Subject: Re: Tika metadata extracted per supported document format?

Hi Andreas,

In Tika 0.8+, you can run the --list-met-models command from tika-app:

java -jar tika-app-version.jar --list-met-models

And get a print out of the met keys that Tika supports. Some parsers add their 
own that aren't part of this met listing, but this is a relatively 
comprehensive 
list.

Cheers,
Chris

On Feb 25, 2011, at 12:10 PM, Andreas Kemkes wrote:

 Hello,
 
 I've asked this on the Tika mailing list w/o an answer, so apologies for 
 cross-posting.
 
 I'm trying to find information that tells me specifically what metadata is 
 provided for the different supported document formats.  Unfortunately all I 
 was 

 able to find so far is The Metadata produced depends on the type of document 
 submitted.
 
 Currently, I'm using ExtractingRequestHandler from Solr 1.4 (with Tika 0.4), 
 so 

 I'm particularly interested in that version, but also in changes that are 
 provided in newer versions of Tika.
 
 Where are the best places to look for such information?
 
 Thanks in advance,
 
 Andreas
 
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++

Re: Tika metadata extracted per supported document format?

2011-02-25 Thread Mattmann, Chris A (388J)

Hi Andreas,

 java -jar tika-app-0.9.jar --list-met-models
 TikaMetadataKeys
 PROTECTED
 RESOURCE_NAME_KEY
 TikaMimeKeys
 MIME_TYPE_MAGIC
 TIKA_MIME_FILE
 
 Both 0.8 and 0.9 give me the same list.  Is that a configuration issue?

Strange -- those are the only met models you're seeing listed?

 
 I'm a bit unclear if that gets me to what I was looking for - metadata 
 like content_type or last_modified.  Or am I confusing Tika metadata 
 with SolrCell metadata?
 
 I thought SolrCell metadata comes from Tika, or does it not?

It does come from Tika that's for sure, but in SolrCell, there is a 
configuration for the ExtractingRequestHandler that remaps
the field names from Tika to Solr. So that's probably where it's coming from. 
Check this out:

http://wiki.apache.org/solr/ExtractingRequestHandler

HTH!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++

Help on query time boosting effect using standardQueryParser

2011-02-25 Thread cyang2010

For the solr example(exampleDIH),  how do i achieve the following with
standard queryparser?

search all docs which name field contains memory (primary query logic), 
Within that resultset, boost the doc matches features:battery  (boosting
logic).


Note that I have to use standard queryparser in my project (for the sake of
fuzzy prefix query and etc).  


I tried using the following query, but it returns result that does not match
name:memory:

query:
str name=rawquerystringname:memory features:battery^100 /str
str name=querystringname:memory features:battery^100 /str
str name=parsedqueryname:memori features:batteri^100.0/str
str name=parsedquery_toStringname:memori features:batteri^100.0/str

genreate result:
result name=response numFound=4 start=0 maxScore=0.23815624
The top result only contains features:battery, but does not have memory
in name field  --  unexpected result.  I don't want such doc as part of
the result.



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-on-query-time-boosting-effect-using-standardQueryParser-tp2579763p2579763.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help on query time boosting effect using standardQueryParser

2011-02-25 Thread cyang2010

Once i change the query to be:

+name:memory features:battery^100 

str name=rawquerystring+name:memory features:battery^100 /str
str name=querystring+name:memory features:battery^100 /str
str name=parsedquery+name:memori features:batteri^100.0/str
str name=parsedquery_toString+name:memori features:batteri^100.0/str


Then it gets rid of result that only result which does not match
name:memory.


However, I wonder if there is better way of achieving this?   Is there
something in standardquery that just affect the ranking?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-on-query-time-boosting-effect-using-standardQueryParser-tp2579763p2579800.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr running on many sites

2011-02-25 Thread Bill Bell

You would want to evaluate the size and the number of searched, plus how often 
the index will need changed data.

There is no recipe just good experience.

Bill Bell
Sent from mobile


On Feb 25, 2011, at 3:06 AM, Stefan Matheis matheis.ste...@googlemail.com 
wrote:

 Hi Grant,
 
 Multi Sites == Multi Cores? :) http://wiki.apache.org/solr/MultiCore have a 
 look
 
 Regards
 Stefan
 
 On Fri, Feb 25, 2011 at 3:15 AM, Grant Longhurst
 grant.longhu...@ecorner.com.au wrote:
 Hi,
 
 
 
 We are a e-commerce service provider and are looking at using solr for
 all the site searches. Was just wondering what the best way is to set it
 up for many sites instead of just one?
 
 
 
 Thanks.
 
 
 
 Regards,
 
 Grant Longhurst
 
 Technical Consultant
 
 eCorner Pty Ltd
 
 
 
 Phone: +61 2 9494 0200
 
 Email: grant.longhu...@ecorner.com.au
 mailto:grant.longhu...@ecorner.com.au
 
 
 
 Web - www.ecorner.com.au
 
 Buy a store - www.ecornerstoresplus.com.au
 http://www.ecornerstoresplus.com.au/
 
 Buy email security - www.cloudmark.com.au
 
 Need Support -  www.ecorner.com.au/support
 http://www.ecorner.com.au/support
 
 Need Help! -  help.ecorner.com http://help.ecorner.com/

How to handle special character in filter query

2011-02-25 Thread cyang2010

How to handle special character when constructing filter query?

for example, i want to do something like:

http://.fq=genre:ACTION  ADVENTURE


How do i handle the space and  in the filter query part?


Thanks.




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-handle-special-character-in-filter-query-tp2579978p2579978.html
Sent from the Solr - User mailing list archive at Nabble.com.

Problems with JSP pages?

2011-02-25 Thread Lance Norskog

I'm on Windows Vista, using the trunk. Some of the JSP pages do not
execute, but instead Jetty downloads them.

solr/admin/get-properties.jsp for example. This is called by the 'JAVA
PROPERTIES' button in the main admin page.

Is this a known problem/quirk for Windows? Or fallout from a jetty
change? Or...?

-- 
Lance Norskog
goks...@gmail.com

51 matches

Mail list logo