Re: [CODE4LIB] Image de-duping and file identification

2013-03-19 Thread Carmen Mitchell
Heh, well she works with teams of students and willing volunteers from
native communities. The faculty member in question has been
doing documentation and revitalization of endangered languages and has
worked on language revitalization efforts with several communities,
including the Oklahoma Kickapoo, the Jicarilla Apache, the Q'anjob'al Maya
community in San Diego, and the Ixhil Maya community in Nebaj, El Quiché,
Guatemala. She wants to make all her research and data public - and has
permission to do so - but there needs to be some assessment and structure
to this. It's a cool (if daunting) project.

We're starting with the JPEGs. I still have the audio and video files to
think about. :-(


On Tue, Mar 19, 2013 at 3:08 PM, Kyle Banerjee wrote:

> On Tue, Mar 19, 2013 at 1:51 PM, Carmen Mitchell
> wrote:
>
> > We are now working on de-duping and assessing file size, focusing on the
> > JPEGs first. With over 300,000 over them...it might take a while. (Of
> > course they aren't following any kind of file naming structure,
> > either...It's a mess.)
> >
>
> 300K files in 10 years? That's more than 80 files per day, 7 days a week,
> 365 days per year. What the heck is this stuff? The method of organizing is
> going to depend on what it is since no one is going to be able to actually
> look at these things.
>
> Locating outright dups is totally braindead but you may have to deal with
> dups that have been resized or altered in some other way. At least for the
> images, exiftool can be handy for that purpose because whatever created the
> photos will have added all kinds of metadata that can be analyzed. Exiftool
> is also really handy for prioritizing processing, and assigning metadata.
>
> kyle
>


Re: [CODE4LIB] Image de-duping and file identification

2013-03-19 Thread Carmen Mitchell
Thanks, Shaun and Terry. I'll pass this info along. Terry, I may have Tyson
contact you directly if he has questions. I look forward to seeing your
lightning talk!

Carmen


On Tue, Mar 19, 2013 at 2:09 PM, Shaun Ellis  wrote:

> Carmen,
> If you are only interested in de-duping and assessing file size, it may be
> overkill.  Picasa has some good organizing and browsing features.  Your
> developer may want to look at the Picasa (Desktop Client) Button API, which
> can kick off scripts for processing selected photos:
> https://developers.google.com/**picasa/docs/button_api<https://developers.google.com/picasa/docs/button_api>
>
> -Shaun
>
>
> On 3/19/13 4:51 PM, Carmen Mitchell wrote:
>
>> Hello Code4Libbers,
>>
>> I'm working with a faculty member and trying to help them to formalize
>> their data collection practices. Part of this process is also going
>> through
>> old data and trying to assess what they currently have. This particular
>> faculty member has been doing research for 10 years without any kind of
>> structure or regular method. So far we have over 2 TB of data in various
>> states. (With more to come.)
>>
>> I've got a programmer working with me to:
>> a) identify file types
>> b) count how many files of each type
>>
>> We are now working on de-duping and assessing file size, focusing on the
>> JPEGs first. With over 300,000 over them...it might take a while. (Of
>> course they aren't following any kind of file naming structure,
>> either...It's a mess.)
>>
>> Any tips or tricks or tools that you might know of to help speed up this
>> process? Is there a good image recognition tool that you could suggest
>> that
>> would help us with automation?
>>
>>   Thanks,
>>
>> Carmen Mitchell
>> Institutional Repository Librarian
>> Cal State San Marcos
>>
>


[CODE4LIB] Image de-duping and file identification

2013-03-19 Thread Carmen Mitchell
Hello Code4Libbers,

I'm working with a faculty member and trying to help them to formalize
their data collection practices. Part of this process is also going through
old data and trying to assess what they currently have. This particular
faculty member has been doing research for 10 years without any kind of
structure or regular method. So far we have over 2 TB of data in various
states. (With more to come.)

I've got a programmer working with me to:
a) identify file types
b) count how many files of each type

We are now working on de-duping and assessing file size, focusing on the
JPEGs first. With over 300,000 over them...it might take a while. (Of
course they aren't following any kind of file naming structure,
either...It's a mess.)

Any tips or tricks or tools that you might know of to help speed up this
process? Is there a good image recognition tool that you could suggest that
would help us with automation?

 Thanks,

Carmen Mitchell
Institutional Repository Librarian
Cal State San Marcos


Re: [CODE4LIB] Groupon: $9 for 3-Day CTA Pass

2013-01-25 Thread Carmen
FWIW, our passes showed up in San Diego today. Plenty of time to spare! 

Carmen

On Jan 16, 2013, at 8:31 AM, Bill Dueber  wrote:

> I guess it depends on when you're leaving, but by my numbers it's more than
> three weeks until the conference...
> 
> 
> On Wed, Jan 16, 2013 at 11:22 AM, Wilhelmina Randtke wrote:
> 
>> It says "Allow up to 3 weeks for delivery of CTA Pass."  This is better if
>> you are going to ALA over the summer, or something else more in the future.
>> 
>> -Wilhelmina Randtke
>> 
>> 
>> On Wed, Jan 16, 2013 at 10:17 AM, Carmen Mitchell
>> wrote:
>> 
>>> For the folks going to Chicago this year...This is a great deal.
>>> 
>>> $9 for a 3-Day Pass from the Chicago Transit Authority ($20 Value)
>> http://www.groupon.com/deals/chicago-transit-authority-cta-3?utm_campaign=UserReferral_dp&utm_medium=email&utm_source=uu83298
>>> 
>>> -Carmen
> 
> 
> 
> -- 
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library


Re: [CODE4LIB] Groupon: $9 for 3-Day CTA Pass

2013-01-16 Thread Carmen Mitchell
Yeah, I'm taking the gamble. If it doesn't get to me in time, I will send
it to one of my Chicago area friends. Or maybe I will just visit again. :-)
If spending the $9 is going to blow up your budget, then don't risk it.


On Wed, Jan 16, 2013 at 8:31 AM, Bill Dueber  wrote:

> I guess it depends on when you're leaving, but by my numbers it's more than
> three weeks until the conference...
>
>
> On Wed, Jan 16, 2013 at 11:22 AM, Wilhelmina Randtke  >wrote:
>
> > It says "Allow up to 3 weeks for delivery of CTA Pass."  This is better
> if
> > you are going to ALA over the summer, or something else more in the
> future.
> >
> > -Wilhelmina Randtke
> >
> >
> > On Wed, Jan 16, 2013 at 10:17 AM, Carmen Mitchell
> > wrote:
> >
> > > For the folks going to Chicago this year...This is a great deal.
> > >
> > >  $9 for a 3-Day Pass from the Chicago Transit Authority ($20 Value)
> > >
> > >
> >
> http://www.groupon.com/deals/chicago-transit-authority-cta-3?utm_campaign=UserReferral_dp&utm_medium=email&utm_source=uu83298
> > >
> > > -Carmen
> > >
> >
>
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>


[CODE4LIB] Groupon: $9 for 3-Day CTA Pass

2013-01-16 Thread Carmen Mitchell
For the folks going to Chicago this year...This is a great deal.

 $9 for a 3-Day Pass from the Chicago Transit Authority ($20 Value)
http://www.groupon.com/deals/chicago-transit-authority-cta-3?utm_campaign=UserReferral_dp&utm_medium=email&utm_source=uu83298

-Carmen