Re: Tika on Jenkins?

2014-01-29 Thread Mattmann, Chris A (398J)
+1 to using travis-ci..

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-283, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Jukka Zitting jukka.zitt...@gmail.com
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Wednesday, January 29, 2014 1:06 PM
To: Tika Development dev@tika.apache.org
Subject: Re: Tika on Jenkins?

Hi,

On Tue, Jan 28, 2014 at 10:37 AM, Allison, Timothy B.
talli...@mitre.org wrote:
 How do we fix the Tika build in Jenkins?

I've kind of lost hope with ASF's Jenkins installation, it's been
broken in various ways for years now. I used to participate in
administering the service, but I guess it's gotten way too complex
nowadays for part-time volunteers to manage.

Unless one of us wants to step up and get their hands dirty fixing
Jenkins issues, I'd probably opt to disable the Jenkins build entirely
and instead use something like Travis (https://travis-ci.org/) for our
CI builds.

BR,

Jukka Zitting



Re: Extract thumbnail from openxml office files

2014-01-09 Thread Mattmann, Chris A (398J)
Hi Hong-Thai,

+1 to using cardinality to help denote more complex metadata relationships
at least until we get past prior discussions on Metadata and name spacing.

See the wiki here for some prior past thoughts:
http://wiki.apache.org/tika/MetadataDiscussion


I know our met structure is simple -- it was purposefully designed that way
even though at the time very complex and hierarchical metadata structures
existed
and could have been leveraged but instead were not in favor of a simple
approach
, e.g., key mutli-value (note distinction between key value).

Thanks!

Cheers,
Chris



-Original Message-
From: Hong-Thai Nguyen hong-thai.ngu...@polyspot.com
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Thursday, January 9, 2014 8:36 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: Extract thumbnail from openxml office files

Hi Nick,
You're begining a very interesting topic about foundation of our metadata
concept :)
I agree with you that metadata is not the best place to store thumbnail
result. Until now, our metadata is simple map with key:values. This
structure is not really flexiable in some cases. For exemple, we would
store author's information, each author has a first name and a last name.
Ideally, we could have some like struct:
Person:
   FirstName
   LastName

An other example is for our futur thumbnail. If we can have a metadata
'thumbnail' with hierarchical structure like:
Thumbnail:
   Dimension
   Width
   Length
   MimeType
   Extension
   Pages
   Description

That needs a huge refactoring about our core model. An other solution is
we can keep thumbnail result is a list Listbyte[] insteads of a single
value. An element is the thumbnail of a page. If the list has only 1
element, mean there's only thumbnail of the first page.

Hong-Thai

-Message d'origine-
De : Nick Burch [mailto:apa...@gagravarr.org]
Envoyé : jeudi 9 janvier 2014 12:11
À : dev@tika.apache.org
Objet : RE: Extract thumbnail from openxml office files

On Thu, 9 Jan 2014, Hong-Thai Nguyen wrote:
 By searching on issues, I found the issue already created:
 https://issues.apache.org/jira/browse/TIKA-90

I'm not sure if the metadata is the right place to return this. Some
formats offer a small thumbnail, others can offer a small thumbnail for
every page, and at least one can include a full-size image of the first
page.

Would we not be better off exposing these embedded renderings via the
existing embedded resources handling, with some sort of handy way to
identify what something is (eg this is a full-size PNG of page 1, this is
a jpg thumbnail of page 3)?

Nick



Re: Apache tika installation issue

2013-09-27 Thread Mattmann, Chris A (398J)
Dear Sudheer,

Did you receive a reply to your question?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: sudheer y sudhe...@datahubsoftware.in
Date: Tuesday, September 17, 2013 12:02 AM
To: dev-ow...@tika.apache.org dev-ow...@tika.apache.org
Subject: Apache tika installation issue

Dear Experts, 


Can you give step by step guide to install apache tika in eclipse using
maven on windows.
 


-- 
Thanks  Best Regards,
Sudheer Kumar Y

Software Engineer

DATAHUB SOFTWARE INDIA PVT LTD. | MAKING IT POSSIBLE
Mobile: +91 8143161684

Email: sudhe...@datahubsoftware.in
WEB : www.datahubsoftware.com http://www.datahubsoftware.com













Re: Apache Tika for Android

2013-08-30 Thread Mattmann, Chris A (398J)
Hi Vasiliy,

It would be great if you could use Apache Tika (which in turn uses
Apache PDFBox) and if it will run on Android. Have you tried it?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Василий Саржинский vasiliy.sarzhins...@mail.ru
Reply-To: Василий Саржинский vasiliy.sarzhins...@mail.ru
Date: Thursday, August 29, 2013 11:21 PM
To: Oleg Tikhonov olegtikho...@gmail.com, dev@tika.apache.org
dev@tika.apache.org, dev-owner dev-ow...@tika.apache.org
Subject: Apache Tika for Android




Hello, Oleg!

Yes, you are right. But I am search a lot of information in the Internet
and unfortunately I couldn't find tool or library that can execute on
Android and can be free for commercial use. There are a lot such tools
and libraries for Android (e.g. iText, Qoppa,
 pdflib TET), but they are not free of charge. If you know some free
tools on Android for extracting text from pdf, could you please give me
advice what I have to use.
Thanks a lot!


With Best Regards,
Vasiliy Sarzhinskiy





Re: Would become a commiter

2013-07-31 Thread Mattmann, Chris A (398J)
Dear Hong-Thai,

Thanks for your interest in the project! Also thanks for your
recent contributions.

Apache is a meritocracy and committership/PMC membership is decided
upon by the Tika PMC. We again appreciate your interest in the project.

Cheers,
Chris


-Original Message-
From: Hong-Thai Nguyen hong-thai.ngu...@polyspot.com
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Wednesday, July 31, 2013 5:43 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: Would become a commiter

Hi all,
I¹m currently working at PolySpot, a provider of Search Engine Solutions.
Tika is one of component at Connector side to extract many kind of files.
We must upgrade frequently Tika within our product, test and fix
eventually some parsing bugs of Tika. We must release temporally in our
local repository by attending new official release Tika version.
 
With these synergy, I would like to be a committer at Tika project.
 
Regards,
 
Hong-Thai Nguyen,
PhD 
RD Engineer 
DDI: +33 (0)1 77 75 73 15
 
Mob: +33 (0)6 27 04 86 22
Skype: thaichat04
hong-thai.ngu...@polyspot.com
 http://www.polyspot.com/
 http://twitter.com/polyspot
http://www.linkedin.com/company/polyspot
79, rue du Faubourg Poissonnière
75009 Paris - France
Access map http://g.co/maps/3e53
PPlease consider the environment before printing this email
This message may contain confidential or privileged information. If you
are not the intended recipient, please advise the sender immediately by
reply e-mail and delete this
 message and any attachments without retaining a copy.
Ce message peut contenir des informations confidentielles ou
privilégiées. Si vous n'êtes pas le destinataire prévu, merci de bien
vouloir en prévenir l'expéditeur immédiatement par
 retour de message électronique et de détruire ce message et toute
éventuelle pièce jointe sans en conserver de copie.
 



Re: [Announce] Welcome Tim Allison as Tika PM member and committer

2013-07-30 Thread Mattmann, Chris A (398J)
Welcome, Tim!

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Nick Burch n...@apache.org
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Tuesday, July 30, 2013 5:29 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: [Announce] Welcome Tim Allison as Tika PM member and committer

Hi All

The Tika PMC VOTE'd to add Tim Allison tallison@ to our merry group as
a 
PMC member and committer. Welcome, Tim! Please feel free to say a bit
about yourself.

Cheers
Nick



Re: Patches for parser.microsoft.WordExtractor

2013-07-01 Thread Mattmann, Chris A (398J)
Dear Denis,

Thank you for your contribution to Tika!

Filing an issue would be great, head over here:

https://issues.apache.org/jira/browse/TIKA

Please sign up for an account, create an issue
and then attach your patch there. I for one would
welcome the contribution and am happy to help shepherd
it into the sources.

Thank you!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: kildishev kildis...@ispras.ru
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Monday, July 1, 2013 5:00 AM
To: dev@tika.apache.org dev@tika.apache.org
Cc: Khoroshilov khoroshi...@ispras.ru
Subject: Patches for parser.microsoft.WordExtractor

Dear Tika developers,

My name is Denis Kildishev and I am working for Institute for System
Programming of the Russian Academy of Sciences (ISPRAS). We use Apache
Tika in our open source project Requality
(https://forge.ispras.ru/projects/reqdb) for doc-xhtml conversion. One
of our requirements is getting xhtml visual representation close to
original doc one.

Working with current version of Tika we found that some improvements
can
be made over it. I'd like to introduce some modifications that were
made
on Word Extractor from parsers package. They includes support of lists,
table borders(according to 2007 specification) and some additional
changes on styling and indents. Also, in our version of this parser we
have XHTML commands buffer that helps to deal with a problem of nested
tables. If it is possible, I'd like to contribute those changes back to
the Tika project. As a first of possible patches I'd like to present
changes over table representation.

This patch includes changes over table representation. The information
about border color is related to specification of 2007 format. Spanning
of cells is taken from poi html parser.

Some of patches, including this one, alters the structure of generated
XHTML file. Different
changes are made over existing unit tests to deal with this fact. All
those changes preserve original original test purposes, but in
different
way. As an example may be a check of table to be on output file. As for
current
trunk version, it is checked by looking for clear table
construction.
When we introduces styling to table, this construction tends to be
wrong,
so, we can looks for table instead.

I will create a corresponding ticket and I will attach my patch there.
It is my first contribution to an Apache project, so I would appreciate
if you guide me how to proceed with it.

Yours sincerely,
Denis Kildishev
Software Engineering Department, ISPRAS



Re: [VOTE] Apache TIka 1.4 Release Candidate #1

2013-06-16 Thread Mattmann, Chris A (398J)
Hey Uwe, seems to work on both latest 3.0.3 and 2.2.1 for me.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Uwe Schindler u...@thetaphi.de
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Sunday, June 16, 2013 10:35 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: [VOTE] Apache TIka 1.4 Release Candidate #1

I forgot: What is the official TIKA-supported Maven version?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Sunday, June 16, 2013 7:33 PM
 To: dev@tika.apache.org
 Subject: RE: [VOTE] Apache TIka 1.4 Release Candidate #1
 
 Hi Mike,
 
 Do you have coordinates of SVN to checkout and which Maven Command to
 run (for randomized JDK I need a free-style build, so I need the correct
 Maven command line+ goals to test everything). How to pass JDK command
 line options to Maven's Surefire (like -Xfoobar -XX+UseG1GC... - in
Lucene
 it's -Dargs=..., what is the maven equivalent to drive the Surefire
Child
 JVM)?
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Michael McCandless [mailto:luc...@mikemccandless.com]
  Sent: Sunday, June 16, 2013 4:43 PM
  To: dev@tika.apache.org
  Subject: Re: [VOTE] Apache TIka 1.4 Release Candidate #1
 
  On Sun, Jun 16, 2013 at 10:13 AM, Uwe Schindler u...@thetaphi.de
 wrote:
   I can setup a windows build on the well-known Policeman Jenkins
   server with the famous random JDK versions and many more features,
   running Lucene tests in 24/7 :-) http://goo.gl/qnxlJ for the talk
   http://jenkins.thetaphi.de/
 
  +1!
 
  Mike McCandless
 
  http://blog.mikemccandless.com




Re: [VOTE] Apache TIka 1.4 Release Candidate #1

2013-06-16 Thread Mattmann, Chris A (398J)
Hey Uwe,

Hmmm, I think command line options for Maven work the same way as in
Lucene (e.g., -Dargs).

Give it a shot and let me know how it works :)

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Michael McCandless luc...@mikemccandless.com
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Sunday, June 16, 2013 10:43 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: Re: [VOTE] Apache TIka 1.4 Release Candidate #1

On Sun, Jun 16, 2013 at 1:32 PM, Uwe Schindler u...@thetaphi.de wrote:

 Do you have coordinates of SVN to checkout

svn checkout https://svn.apache.org/repos/asf/tika/trunk should work.

 and which Maven Command to run (for randomized JDK I need a free-style
build, so I need the correct Maven command line+ goals to test
everything).

I think just mvn test?

 How to pass JDK command line options to Maven's Surefire (like -Xfoobar
-XX+UseG1GC... - in Lucene it's -Dargs=..., what is the maven
equivalent to drive the Surefire Child JVM)?

Hmm this is beyond me!  Anyone else?

Mike McCandless

http://blog.mikemccandless.com



Re: [VOTE] Apache TIka 1.4 Release Candidate #1

2013-06-16 Thread Mattmann, Chris A (398J)
Hey Oleg, are you sending from an address that's subscribed to the list?
I got this email..I also received your +1 before too on RC #1 for 1.4.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Oleg Tikhonov olegtikho...@gmail.com
Reply-To: dev@tika.apache.org dev@tika.apache.org, o...@apache.com
o...@apache.com
Date: Sunday, June 16, 2013 11:16 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: Re: [VOTE] Apache TIka 1.4 Release Candidate #1

I've tried to send some comments about release candidate, however got
delivery failure error. I'm out of list ?

BR,
Oleg


On Sun, Jun 16, 2013 at 9:07 PM, Chris Mattmann mattm...@apache.org
wrote:

 Ouch, just saw this. Oliver, I'm happy to commit the updated patch
 to the trunk but do you absolutely need this in 1.4 requiring me
 to spin up an RC #3?

 Cheers,
 Chris


 -Original Message-
 From: Oliver Heger oliver.he...@oliver-heger.de
 Reply-To: dev@tika.apache.org dev@tika.apache.org
 Date: Sunday, June 16, 2013 10:25 AM
 To: dev@tika.apache.org dev@tika.apache.org
 Subject: Re: [VOTE] Apache TIka 1.4 Release Candidate #1

 Am 16.06.2013 05:52, schrieb Chris Mattmann:
  Hi Guys,
 
  A candidate for the Tika 1.4 release is available at:
 
   http://people.apache.org/~mattmann/apache-tika-1.4/rc1/
 
  The release candidate is a zip archive of the sources in:
 
  http://svn.apache.org/repos/asf/tika/tags/1.4/
 
 
  The SHA1 checksum of the archive is
  1e523c6ed06b4d095d7f6e93a04a8d2ab43c7226.
 
  A staged M2 repository can also be found on repository.apache.org
here:
 
  https://repository.apache.org/content/repositories/orgapachetika-020/
 
 
  Please vote on releasing this package as Apache Tika 1.4.
  The vote is open for the next 72 hours and passes if a majority of at
  least three +1 Tika PMC votes are cast.
 
   [ ] +1 Release this package as Apache Tika 1.4
   [ ] -1 Do not release this package because...
 
  Here is my +1 for the release.
 
  Cheers,
  Chris
 
 
 There is a minor issue with TIKA-991: The original patch had been
 applied, but in the meantime I discovered that the code could enter an
 infinite loop under certain circumstances. Therefore, I provided a
 second patch (the small attachment from Feb 15th). Could this patch be
 applied, too, before the release?
 
 Thanks
 Oliver
 






Re: [VOTE] Apache TIka 1.4 Release Candidate #2

2013-06-16 Thread Mattmann, Chris A (398J)
Hey Uwe,

-Original Message-

From: Uwe Schindler u...@thetaphi.de
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Sunday, June 16, 2013 2:41 PM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: [VOTE] Apache TIka 1.4 Release Candidate #2

With Maven 3.0.4 it worked because Maven 3 is able to resolve
inter-module dependencies inside the reactor. Maven 2 needs all artifacts
to be installed so a plain mvn test without mvn install before does
not work:

http://ahoehma.wordpress.com/2010/12/22/intermodule-dependencies-now-bette
r-working-with-maven-3/

+1. I've found that with MVN 2.2.1 I need to do mvn install first before
mvn test IIRC. Maven 3 exhibits
the same behavior for me too.

So nothing has really changed with 1.4 compared to other releases.


But I found a problem:
TIKA does not work with IBM J9 - the ForkParserTest fails horrible with
class not found and so on:
http://jenkins.thetaphi.de/job/Tika-1.4RC2-Linux/12/consoleFull

Would be great if you could file a ticket for 1.5 with this.

Thanks Uwe.

Cheers,
Chris


Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Sunday, June 16, 2013 11:11 PM
 To: dev@tika.apache.org
 Subject: RE: [VOTE] Apache TIka 1.4 Release Candidate #2
 
 Hi,
 
 I tried a Random JVM Jenkins build on Linux from the repository (with a
clean
 ~/.mvn folder, so no Maven config at all), but it fails with the given
error:
 http://jenkins.thetaphi.de/job/Tika-1.4RC2-Linux/5/consoleFull
 
 It tries to download Tika from Maven Central although it's not yet
there.
 Looks like a circular dependency to itself...
 The same happens in trunk, but thare it works because it downloads the
 previous snapshot from Apache Snapshots repo.
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Chris Mattmann [mailto:mattm...@apache.org]
  Sent: Sunday, June 16, 2013 8:07 PM
  To: dev@tika.apache.org
  Cc: u...@tika.apache.org
  Subject: [VOTE] Apache TIka 1.4 Release Candidate #2
 
  Hi Guys,
 
  A second candidate for the Tika 1.4 release is available at:
 
  http://people.apache.org/~mattmann/apache-tika-1.4/rc2/
 
  The release candidate is a zip archive of the sources in:
 
  http://svn.apache.org/repos/asf/tika/tags/1.4-rc2/
 
  The SHA1 checksum of the archive is
  84ce9ebc104ca348a3cd8e95ec31a96169548c13
 
  A staged M2 repository can also be found on repository.apache.org
here:
 
  https://repository.apache.org/content/repositories/orgapachetika-022/
 
 
  Please vote on releasing this package as Apache Tika 1.4.
  The vote is open for the next 72 hours and passes if a majority of at
  least three +1 Tika PMC votes are cast.
 
  [ ] +1 Release this package as Apache Tika 1.4
  [ ] -1 Do not release this package because...
 
  Here is my +1 for the release.
 
  Cheers,
  Chris
 
 
 
 
 
 
 
 
 





[ANNOUNCE] Open Source Summit 3.0: Communities Meeting: June 25,26, Washington DC USA

2013-05-30 Thread Mattmann, Chris A (398J)
(apologies for cross-post)

http://ossummit.org/

Registration for the Open Source Summit v3.0: Communities Meeting
is now open! This event grows out of the past two years of success.
The first Open Source Summit http://www.nasa.gov/open/source/ was
held at NASA's Ames Research Center in Mountain View, CA and focused
specifically on NASA's open source policies. Last year, OSS
http://open.nasa.gov/summit/ [full schedule]
https://hackpad.com/Open-Source-Summit-2012-Live-Schedule-dXG9B8U2KkZ
moved to the University of Maryland in College Park and broadened
the discussion to include all agencies, including NASA, the State
Dept, and the VA on the planning team.

This year's Open Source Summit will explain how to build, engage
with, and maintain open source communities -- and when we say open
source, we don't just mean software, we also mean hardware and data.

If you are a federal civil servant that needs to build or engage
with an open source community, you should plan on attending.

Be warned however: this is not your average event! The multi-agency
planning team is tasked with ensuring that the event provides
substantive benefit to federal agency personnel, and the format is
uniquely designed to deliver not just abstract content from subject
matter experts (of course we have those), but also the opportunity
to see this knowledge applied to a specific case study, and then
to learn how to apply it to your specific situation.  In addition,
we will collate the results of the discussions during the event and
making them available afterwards so that other may learn from the
shared experiences and wisdom of their peers.

Also, fyi - the registration numbers are unlimited for government
employees, but limited t o 70 for non-govies. We have a total
registration limit of 200. Please use the #ossdc hashtag
if you want to have conversations on Twitter about the meeting.

Thanks and I look forward to your registrations and interest! If
you have any questions,

Please don't hesitate to ask either by replying to this email,
contacting me or anyone else on the Planning Team directly, or by
sending us a Tweet!

Thanks!

Cheers, 
Chris Mattmann (on behalf of the Planning Team)


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






Wanting to contribute to Tika (was Re: [jira] [Commented] (TIKA-992) OpenGraph meta tags to allow multiple values)

2013-05-13 Thread Mattmann, Chris A (398J)
Thanks Pankaj. You may want to start a new thread with specific topics
that you'd like to discuss. This is a thread related to JIRA and TIKA-992
specific to OpenGraph.

I suggest you:

* hang around on dev@ and see if there are topics that interest you that
spring
up and contribute to the discussion there
* review Tika code and suggest improvements, etc., to it, in new threads,
on in
Tika JIRA, reff'ed below.
* review Tika JIRA and existing open bugs/issues and contribute there

HTH!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Pankaj Kumar pank...@usc.edu
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Monday, May 13, 2013 10:04 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: Re: [jira] [Commented] (TIKA-992) OpenGraph meta tags to allow
multiple values

Hello All,

I am new learner of Apache Tika and am very much interested to do some
projects using it.
So, it would be very kind of you, if you could suggest me some project
ideas.

With Regards,
Pankaj Kumar



On Sun, May 12, 2013 at 12:49 PM, kiran (JIRA) j...@apache.org wrote:


 [
 
https://issues.apache.org/jira/browse/TIKA-992?page=com.atlassian.jira.pl
ugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655622#com
ment-13655622]

 kiran commented on TIKA-992:
 

 Hi,

 MultiValues are not stored for any metatags in the HTML and any metatag
 can have multiValued fields too.

 When we use Tika for parsing with Nutch, we noticed that Tika does not
 store the multiValues for any html metatag. Tika only places one value
in
 the DOM tree as reported in NUTCH-1467.

 Does this patch allow Tika to have multiValues for any metatag or just
 OpenGraph metatags ?


  OpenGraph meta tags to allow multiple values
  
 
  Key: TIKA-992
  URL: https://issues.apache.org/jira/browse/TIKA-992
  Project: Tika
   Issue Type: Bug
 Affects Versions: 1.3
 Reporter: Markus Jelsma
 Priority: Minor
  Fix For: 1.4
 
  Attachments: TIKA-992-1.3-1.patch
 
 
  HtmlHandler should use Metadata.add() for Open Graph properties
instead
 of the HtmlHandler.addHtmlMetadata() method which uses Metadata.set().
The
 og:* properties can be multivalued. The Metadata.set() method overwrites
 previous entries because it doesn't use Metadata.appendedValues().

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA
 administrators
 For more information on JIRA, see:
http://www.atlassian.com/software/jira